[jira] [Commented] (FLINK-2398) Decouple StreamGraph Building from the API

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660792#comment-14660792
 ] 

ASF GitHub Bot commented on FLINK-2398:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/988#issuecomment-128507574
  
Yes, I think the automatic rebalance is good. My approach of throwing the 
exception was just the easiest way of dealing with the previously faulty 
behavior. I think people coming from storm are also used to just having 
operations that are executed even if you don't have a sink. So maybe we should 
keep that.


 Decouple StreamGraph Building from the API
 --

 Key: FLINK-2398
 URL: https://issues.apache.org/jira/browse/FLINK-2398
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek

 Currently, the building of the StreamGraph is very intertwined with the API 
 methods. DataStream knows about the StreamGraph and keeps track of splitting, 
 selected names, unions and so on. This leads to the problem that is is very 
 hard to understand how the StreamGraph is built because the code that does it 
 is all over the place. This also makes it hard to extend/change parts of the 
 Streaming system.
 I propose to introduce Transformations. A transformation hold information 
 about one operation: The input streams, types, names, operator and so on. An 
 API method creates a transformation instead of fiddling with the StreamGraph 
 directly. A new component, the StreamGraphGenerator creates a StreamGraph 
 from the tree of transformations that result from program specification using 
 the API methods. This would relieve DataStream from knowing about the 
 StreamGraph and makes unions, splitting, selection visible transformations 
 instead of being scattered across the different API classes as fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2314]

2015-08-06 Thread sheetalparade
GitHub user sheetalparade opened a pull request:

https://github.com/apache/flink/pull/997

[FLINK-2314]

[FLINK-2314] - Added checkpointing feature into File Source


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sheetalparade/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/997.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #997


commit c32d9bfada1f6b601c221185a08c8b1c5a454a5c
Author: Sheetal Parade sheetal.parade+git...@gmail.com
Date:   2015-08-06T16:29:19Z

[FLINK-2314]

Added checkpointing feature into File Source

[FLINK-2314]

Added checkpointing feature into File Source




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660703#comment-14660703
 ] 

ASF GitHub Bot commented on FLINK-2314:
---

GitHub user sheetalparade opened a pull request:

https://github.com/apache/flink/pull/997

[FLINK-2314]

[FLINK-2314] - Added checkpointing feature into File Source


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sheetalparade/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/997.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #997


commit c32d9bfada1f6b601c221185a08c8b1c5a454a5c
Author: Sheetal Parade sheetal.parade+git...@gmail.com
Date:   2015-08-06T16:29:19Z

[FLINK-2314]

Added checkpointing feature into File Source

[FLINK-2314]

Added checkpointing feature into File Source




 Make Streaming File Sources Persistent
 --

 Key: FLINK-2314
 URL: https://issues.apache.org/jira/browse/FLINK-2314
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Sheetal Parade
  Labels: easyfix, starter

 Streaming File sources should participate in the checkpointing. They should 
 track the bytes they read from the file and checkpoint it.
 One can look at the sequence generating source function for an example of a 
 checkpointed source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1138) Allow users to specify methods instead of fields in key expressions

2015-08-06 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660700#comment-14660700
 ] 

Fabian Hueske commented on FLINK-1138:
--

+1 for won't fix

 Allow users to specify methods instead of fields in key expressions
 ---

 Key: FLINK-1138
 URL: https://issues.apache.org/jira/browse/FLINK-1138
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Reporter: Robert Metzger
Priority: Minor

 Currently, users can specify grouping fields only on the fields of a POJO.
 It would be nice to allow users also to name a method (such as 
 getVertexId()) to be called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2015-08-06 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-2491:
-

 Summary: Operators are not participating in state checkpointing in 
some cases
 Key: FLINK-2491
 URL: https://issues.apache.org/jira/browse/FLINK-2491
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Robert Metzger
Priority: Critical


While implementing a test case for the Kafka Consumer, I came across the 
following bug:

Consider the following topology, with the operator parallelism in parentheses:

Source (2) -- Sink (1).

In this setup, the {{snapshotState()}} method is called on the source, but not 
on the Sink.
The sink receives the generated data.
The only one of the two sources is generating data.

I've implemented a test case for this, you can find it here: 
https://github.com/rmetzger/flink/tree/para_checkpoint_bug



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2490) Remove unwanted boolean check in function SocketTextStreamFunction.streamFromSocket

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659721#comment-14659721
 ] 

ASF GitHub Bot commented on FLINK-2490:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/992#issuecomment-128301542
  
if you remove that check, retryForever is unused and can be removed 
completely.


 Remove unwanted boolean check in function 
 SocketTextStreamFunction.streamFromSocket
 ---

 Key: FLINK-2490
 URL: https://issues.apache.org/jira/browse/FLINK-2490
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Huang Wei
Priority: Minor
 Fix For: 0.10

   Original Estimate: 168h
  Remaining Estimate: 168h





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2432] Custom serializer support

2015-08-06 Thread zentol
Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/962#discussion_r36395811
  
--- Diff: 
flink-staging/flink-language-binding/flink-python/src/test/python/org/apache/flink/languagebinding/api/python/flink/test/test_custom.py
 ---
@@ -0,0 +1,76 @@
+# 
###
--- End diff --

Actually, I'm just gonna move this code into the test_main file, I was 
going to do that in my next test-centric PR anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2432) [py] Provide support for custom serialization

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659730#comment-14659730
 ] 

ASF GitHub Bot commented on FLINK-2432:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/962#discussion_r36395811
  
--- Diff: 
flink-staging/flink-language-binding/flink-python/src/test/python/org/apache/flink/languagebinding/api/python/flink/test/test_custom.py
 ---
@@ -0,0 +1,76 @@
+# 
###
--- End diff --

Actually, I'm just gonna move this code into the test_main file, I was 
going to do that in my next test-centric PR anyway.


 [py] Provide support for custom serialization
 -

 Key: FLINK-2432
 URL: https://issues.apache.org/jira/browse/FLINK-2432
 Project: Flink
  Issue Type: New Feature
  Components: Python API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2490][FIX]Remove the retryForever check...

2015-08-06 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/992#issuecomment-128309202
  
If you think it was necessary why was your first step to remove it's 
usage...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2490) Remove unwanted boolean check in function SocketTextStreamFunction.streamFromSocket

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659747#comment-14659747
 ] 

ASF GitHub Bot commented on FLINK-2490:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/992#issuecomment-128309202
  
If you think it was necessary why was your first step to remove it's 
usage...


 Remove unwanted boolean check in function 
 SocketTextStreamFunction.streamFromSocket
 ---

 Key: FLINK-2490
 URL: https://issues.apache.org/jira/browse/FLINK-2490
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Huang Wei
Priority: Minor
 Fix For: 0.10

   Original Estimate: 168h
  Remaining Estimate: 168h





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2490) Remove unwanted boolean check in function SocketTextStreamFunction.streamFromSocket

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659821#comment-14659821
 ] 

ASF GitHub Bot commented on FLINK-2490:
---

Github user HuangWHWHW commented on the pull request:

https://github.com/apache/flink/pull/992#issuecomment-128322151
  
Hah
Sorry, this thought was generated after this PR.


 Remove unwanted boolean check in function 
 SocketTextStreamFunction.streamFromSocket
 ---

 Key: FLINK-2490
 URL: https://issues.apache.org/jira/browse/FLINK-2490
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Huang Wei
Priority: Minor
 Fix For: 0.10

   Original Estimate: 168h
  Remaining Estimate: 168h





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2490][FIX]Remove the retryForever check...

2015-08-06 Thread HuangWHWHW
Github user HuangWHWHW commented on the pull request:

https://github.com/apache/flink/pull/992#issuecomment-128322151
  
Hah
Sorry, this thought was generated after this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2494) Fix StreamGraph getJobGraph bug

2015-08-06 Thread fangfengbin (JIRA)
fangfengbin created FLINK-2494:
--

 Summary: Fix StreamGraph getJobGraph bug
 Key: FLINK-2494
 URL: https://issues.apache.org/jira/browse/FLINK-2494
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.8.1
Reporter: fangfengbin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2494) Fix StreamGraph getJobGraph bug

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661372#comment-14661372
 ] 

ASF GitHub Bot commented on FLINK-2494:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/998#issuecomment-128605743
  
i would assume that forceCheckpoint is supposed to do exactly that, enforce 
checkpointing regardless of its support.

this change also means that if checkPointint is enabled, but not forced, 
the job will not hit an UnsupportedOperationException, which doesn't make any 
sense whatsoever.

-1


 Fix StreamGraph getJobGraph bug
 ---

 Key: FLINK-2494
 URL: https://issues.apache.org/jira/browse/FLINK-2494
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.8.1
Reporter: fangfengbin





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2494 ]Fix StreamGraph getJobGraph bug

2015-08-06 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/998#issuecomment-128605743
  
i would assume that forceCheckpoint is supposed to do exactly that, enforce 
checkpointing regardless of its support.

this change also means that if checkPointint is enabled, but not forced, 
the job will not hit an UnsupportedOperationException, which doesn't make any 
sense whatsoever.

-1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fix StreamGraph getJobGraph bug

2015-08-06 Thread ffbin
GitHub user ffbin opened a pull request:

https://github.com/apache/flink/pull/998

Fix StreamGraph getJobGraph bug

When forceCheckpoint is true,checkpointing will be enabled for iterative 
jobs.But now temporarily  forbid checkpointing for iterative jobs, so if 
forceCheckpoint is true, will throw UnsupportedOperationException.
The old code logic is reversed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ffbin/flink FLINK-2494

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/998.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #998


commit 226354ccf3060e2d0c2ba4dd607bf83ce02735c1
Author: ffbin 869218...@qq.com
Date:   2015-08-07T02:41:55Z

Fix StreamGraph getJobGraph bug




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1901) Create sample operator for Dataset

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661258#comment-14661258
 ] 

ASF GitHub Bot commented on FLINK-1901:
---

Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-128580607
  
Hi, @tillrohrmann , current implementation of sample with fixed size would 
generate fixed size sample for each partition randomly instead of the whole 
dataset, user may expect the later one actually most of the time. I'm research 
on how to sample fixed size elements randomly from distributed data stream, i 
think we can pause this PR review until i merge the previous fix.


 Create sample operator for Dataset
 --

 Key: FLINK-1901
 URL: https://issues.apache.org/jira/browse/FLINK-1901
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Theodore Vasiloudis
Assignee: Chengxiang Li

 In order to be able to implement Stochastic Gradient Descent and a number of 
 other machine learning algorithms we need to have a way to take a random 
 sample from a Dataset.
 We need to be able to sample with or without replacement from the Dataset, 
 choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...

2015-08-06 Thread ChengXiangLi
Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-128580607
  
Hi, @tillrohrmann , current implementation of sample with fixed size would 
generate fixed size sample for each partition randomly instead of the whole 
dataset, user may expect the later one actually most of the time. I'm research 
on how to sample fixed size elements randomly from distributed data stream, i 
think we can pause this PR review until i merge the previous fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661203#comment-14661203
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128563750
  
Thanks for the review, @StephanEwen , i'm very interesting in this project, 
and i would like to contribute more. @vasia , I think stephan has helped to 
answer the question yet, the most important reason is that i want to reuse the 
memory occupied by hash table buckets. Besides, since this is a performance 
sense issue, i try to make this bloom filter as much simple and efficient as i 
can, for example, the hashcode of join key is already generated and stored in 
hybrid hash join, i just reuse the hashcode instead of generate it by join key 
value inside bloom filter again. 


 Use BloomFilter to minimize probe side records which are spilled to disk in 
 Hybrid-Hash-Join
 

 Key: FLINK-2240
 URL: https://issues.apache.org/jira/browse/FLINK-2240
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor
 Fix For: 0.10


 In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
 small table data would be spilled to disk, and the counterpart partition of 
 big table data would be spilled to disk in probe phase as well. If we build a 
 BloomFilter while spill small table to disk during build phase, and use it to 
 filter the big table records which tend to be spilled to disk, this may 
 greatly  reduce the spilled big table file size, and saved the disk IO cost 
 for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661204#comment-14661204
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user ChengXiangLi closed the pull request at:

https://github.com/apache/flink/pull/888


 Use BloomFilter to minimize probe side records which are spilled to disk in 
 Hybrid-Hash-Join
 

 Key: FLINK-2240
 URL: https://issues.apache.org/jira/browse/FLINK-2240
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor
 Fix For: 0.10


 In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
 small table data would be spilled to disk, and the counterpart partition of 
 big table data would be spilled to disk in probe phase as well. If we build a 
 BloomFilter while spill small table to disk during build phase, and use it to 
 filter the big table records which tend to be spilled to disk, this may 
 greatly  reduce the spilled big table file size, and saved the disk IO cost 
 for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-08-06 Thread ChengXiangLi
Github user ChengXiangLi closed the pull request at:

https://github.com/apache/flink/pull/888


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-08-06 Thread ChengXiangLi
Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128563750
  
Thanks for the review, @StephanEwen , i'm very interesting in this project, 
and i would like to contribute more. @vasia , I think stephan has helped to 
answer the question yet, the most important reason is that i want to reuse the 
memory occupied by hash table buckets. Besides, since this is a performance 
sense issue, i try to make this bloom filter as much simple and efficient as i 
can, for example, the hashcode of join key is already generated and stored in 
hybrid hash join, i just reuse the hashcode instead of generate it by join key 
value inside bloom filter again. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660983#comment-14660983
 ] 

ASF GitHub Bot commented on FLINK-2314:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/997#discussion_r36478648
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/FileSourceFunction.java
 ---
@@ -119,13 +124,20 @@ public void run(SourceContextOUT ctx) throws 
Exception {
while (isRunning) {
OUT nextElement = serializer.createInstance();
nextElement =  format.nextRecord(nextElement);
-   if (nextElement == null  splitIterator.hasNext()) {
+   if (nextElement == null  splitIterator.hasNext() ) {
--- End diff --

unnecessary space


 Make Streaming File Sources Persistent
 --

 Key: FLINK-2314
 URL: https://issues.apache.org/jira/browse/FLINK-2314
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Sheetal Parade
  Labels: easyfix, starter

 Streaming File sources should participate in the checkpointing. They should 
 track the bytes they read from the file and checkpoint it.
 One can look at the sequence generating source function for an example of a 
 checkpointed source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2314]

2015-08-06 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/997#discussion_r36478648
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/FileSourceFunction.java
 ---
@@ -119,13 +124,20 @@ public void run(SourceContextOUT ctx) throws 
Exception {
while (isRunning) {
OUT nextElement = serializer.createInstance();
nextElement =  format.nextRecord(nextElement);
-   if (nextElement == null  splitIterator.hasNext()) {
+   if (nextElement == null  splitIterator.hasNext() ) {
--- End diff --

unnecessary space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-2464) BufferSpillerTest sometimes fails

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2464.
-
Resolution: Duplicate

Problem is tracked in [FLINK-2466]

 BufferSpillerTest sometimes fails
 -

 Key: FLINK-2464
 URL: https://issues.apache.org/jira/browse/FLINK-2464
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Gyula Fora
Assignee: Stephan Ewen
Priority: Minor
 Fix For: 0.10


 The BufferSpillerTest failed with the following error:
 org.apache.flink.streaming.runtime.io.BufferSpillerTest
 testSpillWhileReading(org.apache.flink.streaming.runtime.io.BufferSpillerTest)
   Time elapsed: 3.28 sec   FAILURE!
 java.lang.AssertionError: wrong buffer contents expected:0 but was:58
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at 
 org.apache.flink.streaming.runtime.io.BufferSpillerTest.validateBuffer(BufferSpillerTest.java:290)
   at 
 org.apache.flink.streaming.runtime.io.BufferSpillerTest.access$200(BufferSpillerTest.java:42)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2464) BufferSpillerTest sometimes fails

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2464.
---

 BufferSpillerTest sometimes fails
 -

 Key: FLINK-2464
 URL: https://issues.apache.org/jira/browse/FLINK-2464
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Gyula Fora
Assignee: Stephan Ewen
Priority: Minor
 Fix For: 0.10


 The BufferSpillerTest failed with the following error:
 org.apache.flink.streaming.runtime.io.BufferSpillerTest
 testSpillWhileReading(org.apache.flink.streaming.runtime.io.BufferSpillerTest)
   Time elapsed: 3.28 sec   FAILURE!
 java.lang.AssertionError: wrong buffer contents expected:0 but was:58
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at 
 org.apache.flink.streaming.runtime.io.BufferSpillerTest.validateBuffer(BufferSpillerTest.java:290)
   at 
 org.apache.flink.streaming.runtime.io.BufferSpillerTest.access$200(BufferSpillerTest.java:42)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2466) Travis build failure

2015-08-06 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659928#comment-14659928
 ] 

Stephan Ewen commented on FLINK-2466:
-

If someone could double check whether they see code that can crash the JVM in 
{{BarrierBufferMassiveRandomTest}}, {{BarrierBuffer}}, and {{BufferSpiller}}, I 
would appreciate that.

Otherwise, I would suggest to go forward with deprecating Java 6 support and 
close this on strong suspicion that it is a Java 6 issue for the following 
reasons:
  - There is no code in the tested classes that can actually crash a JVM.
  - The problem has occurred more than 50 times for Java 6, but not a single 
time for Java 7 and 8.

 Travis build failure
 

 Key: FLINK-2466
 URL: https://issues.apache.org/jira/browse/FLINK-2466
 Project: Flink
  Issue Type: Bug
Reporter: Sachin Goel
Assignee: Stephan Ewen
 Fix For: 0.10


 One of my builds failed with
 Failed to execute goal 
 org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
 project flink-streaming-core: ExecutionException: java.lang.RuntimeException: 
 The forked VM terminated without properly saying goodbye. VM crash or 
 System.exit called?
 Here's the build: https://travis-ci.org/apache/flink/jobs/73851870
 This looks like a travis hiccup though, similar to how tachyon fails 
 sometimes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-2466) Travis build failure

2015-08-06 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659928#comment-14659928
 ] 

Stephan Ewen edited comment on FLINK-2466 at 8/6/15 12:29 PM:
--

If someone could double check whether they see code that can crash the JVM in 
{{BarrierBufferMassiveRandomTest}}, {{BarrierBuffer}}, and {{BufferSpiller}}, I 
would appreciate that.

Otherwise, I would suggest to go forward with deprecating Java 6 support and 
close this on strong suspicion that it is a Java 6 issue for the following 
reasons:
  - There is no code in the tested classes that can actually crash a JVM 
(unless there is a JVM bug)
  - The problem has occurred more than 50 times for Java 6, but not a single 
time for Java 7 and 8.


was (Author: stephanewen):
If someone could double check whether they see code that can crash the JVM in 
{{BarrierBufferMassiveRandomTest}}, {{BarrierBuffer}}, and {{BufferSpiller}}, I 
would appreciate that.

Otherwise, I would suggest to go forward with deprecating Java 6 support and 
close this on strong suspicion that it is a Java 6 issue for the following 
reasons:
  - There is no code in the tested classes that can actually crash a JVM.
  - The problem has occurred more than 50 times for Java 6, but not a single 
time for Java 7 and 8.

 Travis build failure
 

 Key: FLINK-2466
 URL: https://issues.apache.org/jira/browse/FLINK-2466
 Project: Flink
  Issue Type: Bug
Reporter: Sachin Goel
Assignee: Stephan Ewen
 Fix For: 0.10


 One of my builds failed with
 Failed to execute goal 
 org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
 project flink-streaming-core: ExecutionException: java.lang.RuntimeException: 
 The forked VM terminated without properly saying goodbye. VM crash or 
 System.exit called?
 Here's the build: https://travis-ci.org/apache/flink/jobs/73851870
 This looks like a travis hiccup though, similar to how tachyon fails 
 sometimes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2457) Integrate Tuple0

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659970#comment-14659970
 ] 

ASF GitHub Bot commented on FLINK-2457:
---

Github user mjsax commented on a diff in the pull request:

https://github.com/apache/flink/pull/983#discussion_r36410627
  
--- Diff: 
flink-staging/flink-language-binding/flink-language-binding-generic/src/main/java/org/apache/flink/languagebinding/api/java/common/streaming/Receiver.java
 ---
@@ -346,61 +349,12 @@ public Tuple deserialize() {
}
 
public static Tuple createTuple(int size) {
-   switch (size) {
-   case 0:
-   return new Tuple0();
-   case 1:
-   return new Tuple1();
-   case 2:
-   return new Tuple2();
-   case 3:
-   return new Tuple3();
-   case 4:
-   return new Tuple4();
-   case 5:
-   return new Tuple5();
-   case 6:
-   return new Tuple6();
-   case 7:
-   return new Tuple7();
-   case 8:
-   return new Tuple8();
-   case 9:
-   return new Tuple9();
-   case 10:
-   return new Tuple10();
-   case 11:
-   return new Tuple11();
-   case 12:
-   return new Tuple12();
-   case 13:
-   return new Tuple13();
-   case 14:
-   return new Tuple14();
-   case 15:
-   return new Tuple15();
-   case 16:
-   return new Tuple16();
-   case 17:
-   return new Tuple17();
-   case 18:
-   return new Tuple18();
-   case 19:
-   return new Tuple19();
-   case 20:
-   return new Tuple20();
-   case 21:
-   return new Tuple21();
-   case 22:
-   return new Tuple22();
-   case 23:
-   return new Tuple23();
-   case 24:
-   return new Tuple24();
-   case 25:
-   return new Tuple25();
-   default:
-   throw new IllegalArgumentException(Tuple size 
not supported:  + size);
+   try {
+   return Tuple.getTupleClass(size).newInstance();
--- End diff --

No need for this. `.getTupleClass()` does the check already.


 Integrate Tuple0
 

 Key: FLINK-2457
 URL: https://issues.apache.org/jira/browse/FLINK-2457
 Project: Flink
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

 Tuple0 is not cleanly integrated:
   - missing serialization/deserialization support in runtime
  - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create 
 an instance of Tuple0
 Tuple0 is currently only used in Python API, but will be integrated into 
 Storm compatibility, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2466) Travis build failure

2015-08-06 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659904#comment-14659904
 ] 

Stephan Ewen commented on FLINK-2466:
-

It is the BarrierBufferMassiveRandomTest again. The Killed message means the 
JVM crashes. Not sure why that happens, we are not using any unsafe stuff in 
that test.

While fixing the BufferSpillerTest, I noticed that there is a possible Java 6 
bug when handing FileChannels/DirectByteBuffers over between threads.

For a JVM to crash without the user code using Unsafe, I have a hard time 
imagining what it could be, other than a JVM issue.



 Travis build failure
 

 Key: FLINK-2466
 URL: https://issues.apache.org/jira/browse/FLINK-2466
 Project: Flink
  Issue Type: Bug
Reporter: Sachin Goel
Assignee: Stephan Ewen
 Fix For: 0.10


 One of my builds failed with
 Failed to execute goal 
 org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
 project flink-streaming-core: ExecutionException: java.lang.RuntimeException: 
 The forked VM terminated without properly saying goodbye. VM crash or 
 System.exit called?
 Here's the build: https://travis-ci.org/apache/flink/jobs/73851870
 This looks like a travis hiccup though, similar to how tachyon fails 
 sometimes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2457) Integrate Tuple0

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659941#comment-14659941
 ] 

ASF GitHub Bot commented on FLINK-2457:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/983#discussion_r36408488
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/Tuple0Serializer.java
 ---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license agreements. See the NOTICE
+ * file distributed with this work for additional information regarding 
copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the License); you may 
not use this file except in compliance with the
+ * License. You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License is distributed on
+ * an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either 
express or implied. See the License for the
+ * specific language governing permissions and limitations under the 
License.
+ */
+package org.apache.flink.api.java.typeutils.runtime;
+
+import java.io.IOException;
+
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.java.tuple.Tuple0;
+import org.apache.flink.core.memory.DataInputView;
+import org.apache.flink.core.memory.DataOutputView;
+
+public class Tuple0Serializer extends TupleSerializerTuple0 {
+   private static final long serialVersionUID = 1278813169022975971L;
+
+   public Tuple0Serializer() {
+   super(Tuple0.class, new TypeSerializer?[0]);
+   }
+
+   @Override
+   public Tuple0Serializer duplicate() {
+   return new Tuple0Serializer();
--- End diff --

No need to create a new instance, since this is stateless.


 Integrate Tuple0
 

 Key: FLINK-2457
 URL: https://issues.apache.org/jira/browse/FLINK-2457
 Project: Flink
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

 Tuple0 is not cleanly integrated:
   - missing serialization/deserialization support in runtime
  - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create 
 an instance of Tuple0
 Tuple0 is currently only used in Python API, but will be integrated into 
 Storm compatibility, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/983#discussion_r36408610
  
--- Diff: 
flink-staging/flink-language-binding/flink-language-binding-generic/src/main/java/org/apache/flink/languagebinding/api/java/common/streaming/Receiver.java
 ---
@@ -346,61 +349,12 @@ public Tuple deserialize() {
}
 
public static Tuple createTuple(int size) {
-   switch (size) {
-   case 0:
-   return new Tuple0();
-   case 1:
-   return new Tuple1();
-   case 2:
-   return new Tuple2();
-   case 3:
-   return new Tuple3();
-   case 4:
-   return new Tuple4();
-   case 5:
-   return new Tuple5();
-   case 6:
-   return new Tuple6();
-   case 7:
-   return new Tuple7();
-   case 8:
-   return new Tuple8();
-   case 9:
-   return new Tuple9();
-   case 10:
-   return new Tuple10();
-   case 11:
-   return new Tuple11();
-   case 12:
-   return new Tuple12();
-   case 13:
-   return new Tuple13();
-   case 14:
-   return new Tuple14();
-   case 15:
-   return new Tuple15();
-   case 16:
-   return new Tuple16();
-   case 17:
-   return new Tuple17();
-   case 18:
-   return new Tuple18();
-   case 19:
-   return new Tuple19();
-   case 20:
-   return new Tuple20();
-   case 21:
-   return new Tuple21();
-   case 22:
-   return new Tuple22();
-   case 23:
-   return new Tuple23();
-   case 24:
-   return new Tuple24();
-   case 25:
-   return new Tuple25();
-   default:
-   throw new IllegalArgumentException(Tuple size 
not supported:  + size);
+   try {
+   return Tuple.getTupleClass(size).newInstance();
--- End diff --

This should have a size check and give a proper error message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2492) Rename remaining runtime classes from match to join

2015-08-06 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2492:
---

 Summary: Rename remaining runtime classes from match to join
 Key: FLINK-2492
 URL: https://issues.apache.org/jira/browse/FLINK-2492
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Minor
 Fix For: 0.10


While working with the runtime join classes, I saw that many of them still 
refer to the join as match.

Since all other parts now consistently refer to join, we should adjust the 
runtime classes as well. Makes it easier for new contributors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2466) Travis build failure

2015-08-06 Thread Sachin Goel (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659893#comment-14659893
 ] 

Sachin Goel commented on FLINK-2466:


Most of my builds are still failing too. 

 Travis build failure
 

 Key: FLINK-2466
 URL: https://issues.apache.org/jira/browse/FLINK-2466
 Project: Flink
  Issue Type: Bug
Reporter: Sachin Goel
Assignee: Stephan Ewen
 Fix For: 0.10


 One of my builds failed with
 Failed to execute goal 
 org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
 project flink-streaming-core: ExecutionException: java.lang.RuntimeException: 
 The forked VM terminated without properly saying goodbye. VM crash or 
 System.exit called?
 Here's the build: https://travis-ci.org/apache/flink/jobs/73851870
 This looks like a travis hiccup though, similar to how tachyon fails 
 sometimes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2457) Integrate Tuple0

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659935#comment-14659935
 ] 

ASF GitHub Bot commented on FLINK-2457:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/983#discussion_r36408352
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/tuple/builder/Tuple0Builder.java
 ---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+// --
+//  THIS IS A GENERATED SOURCE FILE. DO NOT EDIT!
+//  GENERATED FROM org.apache.flink.api.java.tuple.TupleGenerator.
+// --
+
+
+package org.apache.flink.api.java.tuple.builder;
+
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.flink.api.java.tuple.Tuple0;
+
+public class Tuple0Builder {
+
+   private ListTuple0 tuples = new LinkedListTuple0();
--- End diff --

ArrayLists are almost always way superior in performance than LinkedLists.


 Integrate Tuple0
 

 Key: FLINK-2457
 URL: https://issues.apache.org/jira/browse/FLINK-2457
 Project: Flink
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

 Tuple0 is not cleanly integrated:
   - missing serialization/deserialization support in runtime
  - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create 
 an instance of Tuple0
 Tuple0 is currently only used in Python API, but will be integrated into 
 Storm compatibility, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/983#discussion_r36408368
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/tuple/builder/Tuple0Builder.java
 ---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+// --
+//  THIS IS A GENERATED SOURCE FILE. DO NOT EDIT!
+//  GENERATED FROM org.apache.flink.api.java.tuple.TupleGenerator.
+// --
+
+
+package org.apache.flink.api.java.tuple.builder;
+
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.flink.api.java.tuple.Tuple0;
+
+public class Tuple0Builder {
+
+   private ListTuple0 tuples = new LinkedListTuple0();
--- End diff --

And memory efficiency...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2457) Integrate Tuple0

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659936#comment-14659936
 ] 

ASF GitHub Bot commented on FLINK-2457:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/983#discussion_r36408368
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/tuple/builder/Tuple0Builder.java
 ---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+// --
+//  THIS IS A GENERATED SOURCE FILE. DO NOT EDIT!
+//  GENERATED FROM org.apache.flink.api.java.tuple.TupleGenerator.
+// --
+
+
+package org.apache.flink.api.java.tuple.builder;
+
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.flink.api.java.tuple.Tuple0;
+
+public class Tuple0Builder {
+
+   private ListTuple0 tuples = new LinkedListTuple0();
--- End diff --

And memory efficiency...


 Integrate Tuple0
 

 Key: FLINK-2457
 URL: https://issues.apache.org/jira/browse/FLINK-2457
 Project: Flink
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

 Tuple0 is not cleanly integrated:
   - missing serialization/deserialization support in runtime
  - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create 
 an instance of Tuple0
 Tuple0 is currently only used in Python API, but will be integrated into 
 Storm compatibility, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/983#discussion_r36408488
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/Tuple0Serializer.java
 ---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license agreements. See the NOTICE
+ * file distributed with this work for additional information regarding 
copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the License); you may 
not use this file except in compliance with the
+ * License. You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License is distributed on
+ * an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either 
express or implied. See the License for the
+ * specific language governing permissions and limitations under the 
License.
+ */
+package org.apache.flink.api.java.typeutils.runtime;
+
+import java.io.IOException;
+
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.java.tuple.Tuple0;
+import org.apache.flink.core.memory.DataInputView;
+import org.apache.flink.core.memory.DataOutputView;
+
+public class Tuple0Serializer extends TupleSerializerTuple0 {
+   private static final long serialVersionUID = 1278813169022975971L;
+
+   public Tuple0Serializer() {
+   super(Tuple0.class, new TypeSerializer?[0]);
+   }
+
+   @Override
+   public Tuple0Serializer duplicate() {
+   return new Tuple0Serializer();
--- End diff --

No need to create a new instance, since this is stateless.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/983#discussion_r36411464
  
--- Diff: 
flink-staging/flink-language-binding/flink-language-binding-generic/src/main/java/org/apache/flink/languagebinding/api/java/common/streaming/Receiver.java
 ---
@@ -346,61 +349,12 @@ public Tuple deserialize() {
}
 
public static Tuple createTuple(int size) {
-   switch (size) {
-   case 0:
-   return new Tuple0();
-   case 1:
-   return new Tuple1();
-   case 2:
-   return new Tuple2();
-   case 3:
-   return new Tuple3();
-   case 4:
-   return new Tuple4();
-   case 5:
-   return new Tuple5();
-   case 6:
-   return new Tuple6();
-   case 7:
-   return new Tuple7();
-   case 8:
-   return new Tuple8();
-   case 9:
-   return new Tuple9();
-   case 10:
-   return new Tuple10();
-   case 11:
-   return new Tuple11();
-   case 12:
-   return new Tuple12();
-   case 13:
-   return new Tuple13();
-   case 14:
-   return new Tuple14();
-   case 15:
-   return new Tuple15();
-   case 16:
-   return new Tuple16();
-   case 17:
-   return new Tuple17();
-   case 18:
-   return new Tuple18();
-   case 19:
-   return new Tuple19();
-   case 20:
-   return new Tuple20();
-   case 21:
-   return new Tuple21();
-   case 22:
-   return new Tuple22();
-   case 23:
-   return new Tuple23();
-   case 24:
-   return new Tuple24();
-   case 25:
-   return new Tuple25();
-   default:
-   throw new IllegalArgumentException(Tuple size 
not supported:  + size);
+   try {
+   return Tuple.getTupleClass(size).newInstance();
--- End diff --

Yep, you're right...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2015-08-06 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-2491:
--
Description: 
While implementing a test case for the Kafka Consumer, I came across the 
following bug:

Consider the following topology, with the operator parallelism in parentheses:

Source (2) -- Sink (1).

In this setup, the {{snapshotState()}} method is called on the source, but not 
on the Sink.
The sink receives the generated data.
The only one of the two sources is generating data.

I've implemented a test case for this, you can find it here: 
https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java

  was:
While implementing a test case for the Kafka Consumer, I came across the 
following bug:

Consider the following topology, with the operator parallelism in parentheses:

Source (2) -- Sink (1).

In this setup, the {{snapshotState()}} method is called on the source, but not 
on the Sink.
The sink receives the generated data.
The only one of the two sources is generating data.

I've implemented a test case for this, you can find it here: 
https://github.com/rmetzger/flink/tree/para_checkpoint_bug


 Operators are not participating in state checkpointing in some cases
 

 Key: FLINK-2491
 URL: https://issues.apache.org/jira/browse/FLINK-2491
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Robert Metzger
Priority: Critical

 While implementing a test case for the Kafka Consumer, I came across the 
 following bug:
 Consider the following topology, with the operator parallelism in parentheses:
 Source (2) -- Sink (1).
 In this setup, the {{snapshotState()}} method is called on the source, but 
 not on the Sink.
 The sink receives the generated data.
 The only one of the two sources is generating data.
 I've implemented a test case for this, you can find it here: 
 https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2457) Integrate Tuple0

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659947#comment-14659947
 ] 

ASF GitHub Bot commented on FLINK-2457:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/983#issuecomment-128350696
  
Some comments inline, other than that, two issues with this pull request:
  - A lot of whitespace reformatting. We explicitly ask not to do this. 
Some IDEs do it automatically, but you can deactivate it. It makes diffs 
dangerously convoluted.
  - Tuple0 can be treated as a singleton, since it has no state. Any reason 
not to do this?


 Integrate Tuple0
 

 Key: FLINK-2457
 URL: https://issues.apache.org/jira/browse/FLINK-2457
 Project: Flink
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

 Tuple0 is not cleanly integrated:
   - missing serialization/deserialization support in runtime
  - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create 
 an instance of Tuple0
 Tuple0 is currently only used in Python API, but will be integrated into 
 Storm compatibility, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2307] Drop Java 6 support

2015-08-06 Thread StephanEwen
GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/993

[FLINK-2307] Drop Java 6 support

This pull request updates Travis to use only openjdk7, oraclejdk7, and 
oraclejdk8 for tests.

Root pom and quickstarts poms bump the Java version to 1.7.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink java7

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/993.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #993


commit 5a788ec23d50d36201ebb2fb0ad2b521272d034f
Author: Stephan Ewen se...@apache.org
Date:   2015-08-06T12:40:53Z

[FLINK-2453] [pom] Move Java source and target version to 1.7

commit 249fa2bcdfc72bd6ce134ccdcb3921547af02752
Author: Stephan Ewen se...@apache.org
Date:   2015-08-06T12:52:39Z

[FLINK-2454] [buikd] Update Travis to drop JDK6 for tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2307) Drop Java 6 support

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659963#comment-14659963
 ] 

ASF GitHub Bot commented on FLINK-2307:
---

GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/993

[FLINK-2307] Drop Java 6 support

This pull request updates Travis to use only openjdk7, oraclejdk7, and 
oraclejdk8 for tests.

Root pom and quickstarts poms bump the Java version to 1.7.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink java7

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/993.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #993


commit 5a788ec23d50d36201ebb2fb0ad2b521272d034f
Author: Stephan Ewen se...@apache.org
Date:   2015-08-06T12:40:53Z

[FLINK-2453] [pom] Move Java source and target version to 1.7

commit 249fa2bcdfc72bd6ce134ccdcb3921547af02752
Author: Stephan Ewen se...@apache.org
Date:   2015-08-06T12:52:39Z

[FLINK-2454] [buikd] Update Travis to drop JDK6 for tests




 Drop Java 6 support
 ---

 Key: FLINK-2307
 URL: https://issues.apache.org/jira/browse/FLINK-2307
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 0.10
Reporter: Robert Metzger

 As per: 
 http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Failing-Builds-on-Travis-td5360.html
 We need to change the java version in the poms and adopt the Travis build 
 profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (FLINK-2464) BufferSpillerTest sometimes fails

2015-08-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/FLINK-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Márton Balassi reopened FLINK-2464:
---

Thanks for the fix, [~StephanEwen]. Unfortunately I still see tests of the 
`flink-streaming-core` module fail due to a killed JVM post your commit. [1] 

I can also suspect that this is coming from either the `BarrierBufferTest` or 
the `BufferSpillerTest`, becuase after disabling them `flink-streaming-core` 
did not once fail on java 6 in 20 builds triggered on travis. [2] (Builds that 
still failed were due to the failures of the `StreamCheckpointingITCase` or the 
`PartitionedStateCheckpointingITCase` in `flink-tests`.) 

[1] https://travis-ci.org/mbalassi/flink/jobs/74391653
[2] See build #1 through #10 here 
https://travis-ci.org/mbalassi-travis-0/flink/builds

 BufferSpillerTest sometimes fails
 -

 Key: FLINK-2464
 URL: https://issues.apache.org/jira/browse/FLINK-2464
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Gyula Fora
Assignee: Stephan Ewen
Priority: Minor
 Fix For: 0.10


 The BufferSpillerTest failed with the following error:
 org.apache.flink.streaming.runtime.io.BufferSpillerTest
 testSpillWhileReading(org.apache.flink.streaming.runtime.io.BufferSpillerTest)
   Time elapsed: 3.28 sec   FAILURE!
 java.lang.AssertionError: wrong buffer contents expected:0 but was:58
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at 
 org.apache.flink.streaming.runtime.io.BufferSpillerTest.validateBuffer(BufferSpillerTest.java:290)
   at 
 org.apache.flink.streaming.runtime.io.BufferSpillerTest.access$200(BufferSpillerTest.java:42)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/983#discussion_r36408352
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/tuple/builder/Tuple0Builder.java
 ---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+// --
+//  THIS IS A GENERATED SOURCE FILE. DO NOT EDIT!
+//  GENERATED FROM org.apache.flink.api.java.tuple.TupleGenerator.
+// --
+
+
+package org.apache.flink.api.java.tuple.builder;
+
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.flink.api.java.tuple.Tuple0;
+
+public class Tuple0Builder {
+
+   private ListTuple0 tuples = new LinkedListTuple0();
--- End diff --

ArrayLists are almost always way superior in performance than LinkedLists.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2457) Integrate Tuple0

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659942#comment-14659942
 ] 

ASF GitHub Bot commented on FLINK-2457:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/983#discussion_r36408610
  
--- Diff: 
flink-staging/flink-language-binding/flink-language-binding-generic/src/main/java/org/apache/flink/languagebinding/api/java/common/streaming/Receiver.java
 ---
@@ -346,61 +349,12 @@ public Tuple deserialize() {
}
 
public static Tuple createTuple(int size) {
-   switch (size) {
-   case 0:
-   return new Tuple0();
-   case 1:
-   return new Tuple1();
-   case 2:
-   return new Tuple2();
-   case 3:
-   return new Tuple3();
-   case 4:
-   return new Tuple4();
-   case 5:
-   return new Tuple5();
-   case 6:
-   return new Tuple6();
-   case 7:
-   return new Tuple7();
-   case 8:
-   return new Tuple8();
-   case 9:
-   return new Tuple9();
-   case 10:
-   return new Tuple10();
-   case 11:
-   return new Tuple11();
-   case 12:
-   return new Tuple12();
-   case 13:
-   return new Tuple13();
-   case 14:
-   return new Tuple14();
-   case 15:
-   return new Tuple15();
-   case 16:
-   return new Tuple16();
-   case 17:
-   return new Tuple17();
-   case 18:
-   return new Tuple18();
-   case 19:
-   return new Tuple19();
-   case 20:
-   return new Tuple20();
-   case 21:
-   return new Tuple21();
-   case 22:
-   return new Tuple22();
-   case 23:
-   return new Tuple23();
-   case 24:
-   return new Tuple24();
-   case 25:
-   return new Tuple25();
-   default:
-   throw new IllegalArgumentException(Tuple size 
not supported:  + size);
+   try {
+   return Tuple.getTupleClass(size).newInstance();
--- End diff --

This should have a size check and give a proper error message.


 Integrate Tuple0
 

 Key: FLINK-2457
 URL: https://issues.apache.org/jira/browse/FLINK-2457
 Project: Flink
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

 Tuple0 is not cleanly integrated:
   - missing serialization/deserialization support in runtime
  - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create 
 an instance of Tuple0
 Tuple0 is currently only used in Python API, but will be integrated into 
 Storm compatibility, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/983#issuecomment-128350696
  
Some comments inline, other than that, two issues with this pull request:
  - A lot of whitespace reformatting. We explicitly ask not to do this. 
Some IDEs do it automatically, but you can deactivate it. It makes diffs 
dangerously convoluted.
  - Tuple0 can be treated as a singleton, since it has no state. Any reason 
not to do this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2307) Drop Java 6 support

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659966#comment-14659966
 ] 

ASF GitHub Bot commented on FLINK-2307:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/993#issuecomment-128354239
  
+1 (given the tests pass)


 Drop Java 6 support
 ---

 Key: FLINK-2307
 URL: https://issues.apache.org/jira/browse/FLINK-2307
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 0.10
Reporter: Robert Metzger

 As per: 
 http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Failing-Builds-on-Travis-td5360.html
 We need to change the java version in the poms and adopt the Travis build 
 profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2457) Integrate Tuple0

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659977#comment-14659977
 ] 

ASF GitHub Bot commented on FLINK-2457:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/983#discussion_r36411464
  
--- Diff: 
flink-staging/flink-language-binding/flink-language-binding-generic/src/main/java/org/apache/flink/languagebinding/api/java/common/streaming/Receiver.java
 ---
@@ -346,61 +349,12 @@ public Tuple deserialize() {
}
 
public static Tuple createTuple(int size) {
-   switch (size) {
-   case 0:
-   return new Tuple0();
-   case 1:
-   return new Tuple1();
-   case 2:
-   return new Tuple2();
-   case 3:
-   return new Tuple3();
-   case 4:
-   return new Tuple4();
-   case 5:
-   return new Tuple5();
-   case 6:
-   return new Tuple6();
-   case 7:
-   return new Tuple7();
-   case 8:
-   return new Tuple8();
-   case 9:
-   return new Tuple9();
-   case 10:
-   return new Tuple10();
-   case 11:
-   return new Tuple11();
-   case 12:
-   return new Tuple12();
-   case 13:
-   return new Tuple13();
-   case 14:
-   return new Tuple14();
-   case 15:
-   return new Tuple15();
-   case 16:
-   return new Tuple16();
-   case 17:
-   return new Tuple17();
-   case 18:
-   return new Tuple18();
-   case 19:
-   return new Tuple19();
-   case 20:
-   return new Tuple20();
-   case 21:
-   return new Tuple21();
-   case 22:
-   return new Tuple22();
-   case 23:
-   return new Tuple23();
-   case 24:
-   return new Tuple24();
-   case 25:
-   return new Tuple25();
-   default:
-   throw new IllegalArgumentException(Tuple size 
not supported:  + size);
+   try {
+   return Tuple.getTupleClass(size).newInstance();
--- End diff --

Yep, you're right...


 Integrate Tuple0
 

 Key: FLINK-2457
 URL: https://issues.apache.org/jira/browse/FLINK-2457
 Project: Flink
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

 Tuple0 is not cleanly integrated:
   - missing serialization/deserialization support in runtime
  - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create 
 an instance of Tuple0
 Tuple0 is currently only used in Python API, but will be integrated into 
 Storm compatibility, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128441993
  
This was a super cool contribution. A pretty sophisticated addition, super 
testing, high code quality.
I am very impressed!

I hope you will contribute more to Flink. Already saw that you opened 
another pull request, for a sampling operator. Happy that this is happening :-)

In the future, I can hopefully review and merge the pull requests faster. 
The past weeks, I did not get to code work as much as I wanted, and the list of 
critical issues was long, so this pull request got delayed a bit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660355#comment-14660355
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128445757
  
Hi,
this looks great indeed!

Just out of curiosity, why did you write your own bloom filter 
implementation and not use a ready one, e.g. from guava? I'm wondering because 
in #923 we also want to use a bloom filter for an approximate algorithm 
implementation.

Thanks!


 Use BloomFilter to minimize probe side records which are spilled to disk in 
 Hybrid-Hash-Join
 

 Key: FLINK-2240
 URL: https://issues.apache.org/jira/browse/FLINK-2240
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor
 Fix For: 0.10


 In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
 small table data would be spilled to disk, and the counterpart partition of 
 big table data would be spilled to disk in probe phase as well. If we build a 
 BloomFilter while spill small table to disk during build phase, and use it to 
 filter the big table records which tend to be spilled to disk, this may 
 greatly  reduce the spilled big table file size, and saved the disk IO cost 
 for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2444) Add tests for HadoopInputFormats

2015-08-06 Thread James Cao (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660371#comment-14660371
 ] 

James Cao commented on FLINK-2444:
--

Hi, for a sufficient test, what's the expected strategy?
The hive and hadoop community use hadoop minicluster to do end-to-end unit 
test. I tried run a flink word count task against the minicluster inside the 
ide, it takes about ~5s (including provisioning of the mini cluster, and tear 
down the cluster afterwards.) Is this an acceptable running time?
I guess if we use minicluster, we can get relative sufficient test for the 
HadoopInputFormats's wrapped format for both mapred and mapreduce style api, 
and it's probably not very easy to set up a mock test that simulate the hadoop 
fs environment. The problem with minicluster is that it's only available in 
hadoop2. So it's not available in hadoop1 profile. 

I think the issue I am working on [FLINK-1919] Hcatoutputformat also has a 
similar problem. Do we want to run the test against a mini-hive server in that 
case?

 Add tests for HadoopInputFormats
 

 Key: FLINK-2444
 URL: https://issues.apache.org/jira/browse/FLINK-2444
 Project: Flink
  Issue Type: Test
  Components: Hadoop Compatibility, Tests
Affects Versions: 0.10, 0.9.0
Reporter: Fabian Hueske
  Labels: starter

 The HadoopInputFormats and HadoopInputFormatBase classes are not sufficiently 
 covered by unit tests.
 We need tests that ensure that the methods of the wrapped Hadoop InputFormats 
 are correctly called. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-981) Support for generated Cloudera Hadoop configuration

2015-08-06 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660492#comment-14660492
 ] 

Robert Metzger commented on FLINK-981:
--

I will close this issue for now. No user every complained about it, I've used 
Flink on a cloudera system 2-3 month ago.

 Support for generated Cloudera Hadoop configuration 
 

 Key: FLINK-981
 URL: https://issues.apache.org/jira/browse/FLINK-981
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime, YARN Client
Reporter: Robert Metzger

 Cloudera Hadoop generates configuration files that different from the vanilla 
 upstream Hadoop configuration files.
 The HDFS and the YARN component both access configuration values from Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2493) Simplify names of example program JARs

2015-08-06 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2493:
---

 Summary: Simplify names of example program JARs
 Key: FLINK-2493
 URL: https://issues.apache.org/jira/browse/FLINK-2493
 Project: Flink
  Issue Type: Improvement
  Components: Examples
Affects Versions: 0.10
Reporter: Stephan Ewen
Priority: Minor


I find the names of the example JARs a bit annoying.

Why not name the file {{examples/ConnectedComponents.jar}} rather than 
{{examples/flink-java-examples-0.10-SNAPSHOT-ConnectedComponents.jar}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128440090
  
Manually merged in 61dcae391cb3b45ba3aff47d4d9163889d2958a4

I added a commit on top to pass the flag to enable/disable bloom filters 
through the runtime configuration. That is the basis for later allowing it to 
enable it on a per-job basis. Also, we want to get rid of the 
`GlobalConfiguration` singleton pattern.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660322#comment-14660322
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128440090
  
Manually merged in 61dcae391cb3b45ba3aff47d4d9163889d2958a4

I added a commit on top to pass the flag to enable/disable bloom filters 
through the runtime configuration. That is the basis for later allowing it to 
enable it on a per-job basis. Also, we want to get rid of the 
`GlobalConfiguration` singleton pattern.


 Use BloomFilter to minimize probe side records which are spilled to disk in 
 Hybrid-Hash-Join
 

 Key: FLINK-2240
 URL: https://issues.apache.org/jira/browse/FLINK-2240
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor

 In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
 small table data would be spilled to disk, and the counterpart partition of 
 big table data would be spilled to disk in probe phase as well. If we build a 
 BloomFilter while spill small table to disk during build phase, and use it to 
 filter the big table records which tend to be spilled to disk, this may 
 greatly  reduce the spilled big table file size, and saved the disk IO cost 
 for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-100) Pact API Proposal: Add keyless CoGroup (send all to a single group)

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-100.
--

 Pact API Proposal: Add keyless CoGroup (send all to a single group)
 ---

 Key: FLINK-100
 URL: https://issues.apache.org/jira/browse/FLINK-100
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
Priority: Minor
  Labels: github-import

 I propose to add a keyless version of CoGroup that groups both inputs in a 
 single group, analogous to the keyless Reducer version that was added in 
 https://github.com/dimalabs/ozone/pull/61
 ```
 CoGroupContract myCoGroup = CoGroupContract.builder(MyUdf.class)
 .input1(contractA)
 .input2(contractB)
 .build();
 ```
 I have a use case where I need to process the output of two contracts in a 
 single udf and I currently have to use the workaround to add a constant field 
 and use this as grouping key.
 Adding a keyless version would reduce the overhead (network traffic, 
 serialization and code-writing) and give the compiler additional knowledge 
 (The compiler knows that there will be only a single group and a single udf 
 call. If setAvgRecordsEmittedPerStubCall is set, it could infer the output 
 cardinality)
 Furthermore I think this would be consequent, because CoGroup is like Reduce 
 for multiple inputs.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/100
 Created by: [andrehacker|https://github.com/andrehacker]
 Labels: enhancement, 
 Created at: Sat Sep 14 23:15:59 CEST 2013
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-202) Workset Iterations: No Match Found Behaviour of Solution Set Join

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-202.

   Resolution: Fixed
Fix Version/s: (was: pre-apache)
   0.8.0

Solution set returns null on lookups of missing entries.

 Workset Iterations: No Match Found Behaviour of Solution Set Join
 ---

 Key: FLINK-202
 URL: https://issues.apache.org/jira/browse/FLINK-202
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
  Labels: github-import
 Fix For: 0.8.0


 If I do a solution set match and there is no corresponding entry in the 
 solution set index a RuntimeException is thrown and the job fails. Therefore 
 the initial solution set must already contain every element which will be in 
 the final solution set.
 I'm not sure if this is a real limitation, but I find it inconvenient. When I 
 was playing around with the workset connected components I couldn't just use 
 parts of my test data, because it resulted in a solution set join where 
 records couldn't be matched.
 I just wanted to think out loudly here. Did anybody else find this 
 inconvenient? What alternative should we provide? 
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/202
 Created by: [uce|https://github.com/uce]
 Labels: core, enhancement, 
 Created at: Wed Oct 23 23:24:29 CEST 2013
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-655) Add support for both single and set of broadcast values

2015-08-06 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660395#comment-14660395
 ] 

Stephan Ewen commented on FLINK-655:


Will this still be done, or should we close it as won't fix?

 Add support for both single and set of broadcast values
 ---

 Key: FLINK-655
 URL: https://issues.apache.org/jira/browse/FLINK-655
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Reporter: Ufuk Celebi
Assignee: Henry Saputra
  Labels: breaking-api, github-import, starter
 Fix For: pre-apache


 To broadcast a data set you have to do the following:
 ```java
 lorem.map(new MyMapper()).withBroadcastSet(toBroadcast, toBroadcastName)
 ```
 In the operator you call:
 ```java
 getRuntimeContext().getBroadcastVariable(toBroadcastName)
 ```
 I propose to have both method names consistent, e.g.
   - `withBroadcastVariable(DataSet, String)`, or
   - `getBroadcastSet(String)`.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/655
 Created by: [uce|https://github.com/uce]
 Labels: enhancement, java api, user satisfaction, 
 Created at: Wed Apr 02 16:29:08 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-915) Introduce two in one progress bars

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-915.

   Resolution: Invalid
Fix Version/s: (was: pre-apache)

Outdated and subsumed by new Task State Model and new web frontend.

 Introduce two in one progress bars
 --

 Key: FLINK-915
 URL: https://issues.apache.org/jira/browse/FLINK-915
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
Priority: Trivial
  Labels: github-import
 Attachments: pull-request-915-1281267458081589740.patch


 The two in one progress bars are approximations which are calculated out
 of the job event information.
 Additionally: FINISHING tasks are still shown as running tasks.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/pull/915
 Created by: [tobwiens|https://github.com/tobwiens]
 Labels: enhancement, 
 Created at: Fri Jun 06 17:02:53 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-915) Introduce two in one progress bars

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-915.
--

 Introduce two in one progress bars
 --

 Key: FLINK-915
 URL: https://issues.apache.org/jira/browse/FLINK-915
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
Priority: Trivial
  Labels: github-import
 Attachments: pull-request-915-1281267458081589740.patch


 The two in one progress bars are approximations which are calculated out
 of the job event information.
 Additionally: FINISHING tasks are still shown as running tasks.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/pull/915
 Created by: [tobwiens|https://github.com/tobwiens]
 Labels: enhancement, 
 Created at: Fri Jun 06 17:02:53 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1082) WebClient improvements

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1082.
-
Resolution: Done

Remaining issues will not be fixed. Remainder is subsumed by new web frontend 
[FLINK-2357]

 WebClient improvements
 --

 Key: FLINK-1082
 URL: https://issues.apache.org/jira/browse/FLINK-1082
 Project: Flink
  Issue Type: Improvement
Reporter: Jonathan Hasenburg

 New Issue to summarize all things that should be done regarding the webclient.
 * DONE: Setting the ship strategy of a broadcast variable from broadcast to 
 broadcast variable
 * DONE: Reduce size of nodes by removing some information
 * DONE: Some nodes can be at the same time next partial solution and 
 termination criterion, or next workset and solution set delta. It 
 currently seems to highlight one one of them.
 * DONE: Recreate the generation of the JSON strings which are used to create 
 the graph. Change to an object based layout to get away from the string based 
 layout.
 * NOT POSSIBLE, because this creates a circle and it can happen that the 
 first node is at the right side of the graph - confusing: Feedback arrow 
 from the next partial solution to the partial solution
 * show full graph at very beginning
 * fix bug in chrome where the node boxes are to small
 * DONE: graph should be redrawn when the window gets resized



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1083) WebInterface improvements

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1083.
-
Resolution: Done

Remainder is subsumed by new web frontend [FLINK-2357]

 WebInterface improvements
 -

 Key: FLINK-1083
 URL: https://issues.apache.org/jira/browse/FLINK-1083
 Project: Flink
  Issue Type: Improvement
  Components: JobManager
Reporter: Jonathan Hasenburg

 New Issue to summarize all things that should be done regarding the 
 webinterface.
 * rework dashboard in a way that more than one job can be shown ... . If a 
 job is clicked you get to the details.
 * DONE: add history to dashboard
 * Running jobs should get a view like jobs in the history.
 * DONE: rework the menu and try to add some links to the dashboard (like the 
 taskmanager section)
 * improve the way the jsons are send to the webinterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1082) WebClient improvements

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1082.
---

 WebClient improvements
 --

 Key: FLINK-1082
 URL: https://issues.apache.org/jira/browse/FLINK-1082
 Project: Flink
  Issue Type: Improvement
Reporter: Jonathan Hasenburg

 New Issue to summarize all things that should be done regarding the webclient.
 * DONE: Setting the ship strategy of a broadcast variable from broadcast to 
 broadcast variable
 * DONE: Reduce size of nodes by removing some information
 * DONE: Some nodes can be at the same time next partial solution and 
 termination criterion, or next workset and solution set delta. It 
 currently seems to highlight one one of them.
 * DONE: Recreate the generation of the JSON strings which are used to create 
 the graph. Change to an object based layout to get away from the string based 
 layout.
 * NOT POSSIBLE, because this creates a circle and it can happen that the 
 first node is at the right side of the graph - confusing: Feedback arrow 
 from the next partial solution to the partial solution
 * show full graph at very beginning
 * fix bug in chrome where the node boxes are to small
 * DONE: graph should be redrawn when the window gets resized



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128442757
  
Oh, I forgot to add the closing message to the commit, so the ASF bot did 
not close the pull request. Can you close the pull request manually (only you 
as the owner can do that).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-7) [GitHub] Enable Range Partitioner

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-7:
-
Issue Type: Sub-task  (was: New Feature)
Parent: FLINK-598

 [GitHub] Enable Range Partitioner
 -

 Key: FLINK-7
 URL: https://issues.apache.org/jira/browse/FLINK-7
 Project: Flink
  Issue Type: Sub-task
Reporter: GitHub Import
  Labels: github-import
 Fix For: pre-apache


 The range partitioner is currently disabled. We need to implement the 
 following aspects:
 1) Distribution information, if available, must be propagated back together 
 with the ordering property.
 2) A generic bucket lookup structure (currently specific to PactRecord).
 Tests to re-enable after fixing this issue:
  - TeraSortITCase
  - GlobalSortingITCase
  - GlobalSortingMixedOrderITCase
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/7
 Created by: [StephanEwen|https://github.com/StephanEwen]
 Labels: core, enhancement, optimizer, 
 Milestone: Release 0.4
 Assignee: [fhueske|https://github.com/fhueske]
 Created at: Fri Apr 26 13:48:24 CEST 2013
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-08-06 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128445757
  
Hi,
this looks great indeed!

Just out of curiosity, why did you write your own bloom filter 
implementation and not use a ready one, e.g. from guava? I'm wondering because 
in #923 we also want to use a bloom filter for an approximate algorithm 
implementation.

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-938.
--

 Change start-cluster.sh script so that users don't have to configure the 
 JobManager address
 ---

 Key: FLINK-938
 URL: https://issues.apache.org/jira/browse/FLINK-938
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: Robert Metzger
Assignee: Mingliang Qi
Priority: Minor
 Fix For: 0.9


 To improve the user experience, Flink should not require users to configure 
 the JobManager's address on a cluster.
 In combination with FLINK-934, this would allow running Flink with decent 
 performance on a cluster without setting a single configuration value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2492) Rename remaining runtime classes from match to join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660315#comment-14660315
 ] 

ASF GitHub Bot commented on FLINK-2492:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/995#issuecomment-128439329
  
Manually merged in 685086a3dd9afcec2eec76485298bc7b3f031a3c


 Rename remaining runtime classes from match to join
 ---

 Key: FLINK-2492
 URL: https://issues.apache.org/jira/browse/FLINK-2492
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Minor
 Fix For: 0.10


 While working with the runtime join classes, I saw that many of them still 
 refer to the join as match.
 Since all other parts now consistently refer to join, we should adjust the 
 runtime classes as well. Makes it easier for new contributors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2492] [runtime] Rename former 'match' c...

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/995#issuecomment-128439329
  
Manually merged in 685086a3dd9afcec2eec76485298bc7b3f031a3c


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128446267
  
The bloom filters are stored in subregions of Flink's memory segments, not 
in any additional memory. 

That is very nice (occupies no extra memory), but requires to go against 
Flink's memory segments, rather than longs or byte arrays. Hence, the custom 
implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-266) Warn user if cluster did not come up as expected

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-266.

   Resolution: Won't Fix
Fix Version/s: (was: pre-apache)

The deployment model is different now, the pre-flight phase and optimizer do 
not connect to the cluster any more to gather the availability of resources. 
Therefor, this issue is not really an issue any more ;-)

 Warn user if cluster did not come up as expected
 

 Key: FLINK-266
 URL: https://issues.apache.org/jira/browse/FLINK-266
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
  Labels: github-import

 While I did some work on a cluster, I was wondering why my job did not 
 utilize all TaskManagers.
 It seems that I started my job too early (before all TaskManager registered 
 with the JobManager) and therefore, the compiler did not consider them.
 We should either make the `start-cluster.sh` script blocking (with a 
 timeout). Or the pact-client.sh should report a warning if less TaskManagers 
 than expected (number in `slaves` file) are up.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/266
 Created by: [rmetzger|https://github.com/rmetzger]
 Labels: enhancement, runtime, simple-issue, 
 Created at: Mon Nov 11 15:56:56 CET 2013
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-202) Workset Iterations: No Match Found Behaviour of Solution Set Join

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-202.
--

 Workset Iterations: No Match Found Behaviour of Solution Set Join
 ---

 Key: FLINK-202
 URL: https://issues.apache.org/jira/browse/FLINK-202
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
  Labels: github-import
 Fix For: 0.8.0


 If I do a solution set match and there is no corresponding entry in the 
 solution set index a RuntimeException is thrown and the job fails. Therefore 
 the initial solution set must already contain every element which will be in 
 the final solution set.
 I'm not sure if this is a real limitation, but I find it inconvenient. When I 
 was playing around with the workset connected components I couldn't just use 
 parts of my test data, because it resulted in a solution set join where 
 records couldn't be matched.
 I just wanted to think out loudly here. Did anybody else find this 
 inconvenient? What alternative should we provide? 
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/202
 Created by: [uce|https://github.com/uce]
 Labels: core, enhancement, 
 Created at: Wed Oct 23 23:24:29 CEST 2013
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-266) Warn user if cluster did not come up as expected

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-266.
--

 Warn user if cluster did not come up as expected
 

 Key: FLINK-266
 URL: https://issues.apache.org/jira/browse/FLINK-266
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
  Labels: github-import

 While I did some work on a cluster, I was wondering why my job did not 
 utilize all TaskManagers.
 It seems that I started my job too early (before all TaskManager registered 
 with the JobManager) and therefore, the compiler did not consider them.
 We should either make the `start-cluster.sh` script blocking (with a 
 timeout). Or the pact-client.sh should report a warning if less TaskManagers 
 than expected (number in `slaves` file) are up.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/266
 Created by: [rmetzger|https://github.com/rmetzger]
 Labels: enhancement, runtime, simple-issue, 
 Created at: Mon Nov 11 15:56:56 CET 2013
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660334#comment-14660334
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128442757
  
Oh, I forgot to add the closing message to the commit, so the ASF bot did 
not close the pull request. Can you close the pull request manually (only you 
as the owner can do that).


 Use BloomFilter to minimize probe side records which are spilled to disk in 
 Hybrid-Hash-Join
 

 Key: FLINK-2240
 URL: https://issues.apache.org/jira/browse/FLINK-2240
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor

 In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
 small table data would be spilled to disk, and the counterpart partition of 
 big table data would be spilled to disk in probe phase as well. If we build a 
 BloomFilter while spill small table to disk during build phase, and use it to 
 filter the big table records which tend to be spilled to disk, this may 
 greatly  reduce the spilled big table file size, and saved the disk IO cost 
 for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2492) Rename remaining runtime classes from match to join

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2492.
-
Resolution: Fixed

Fixed via 685086a3dd9afcec2eec76485298bc7b3f031a3c

 Rename remaining runtime classes from match to join
 ---

 Key: FLINK-2492
 URL: https://issues.apache.org/jira/browse/FLINK-2492
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Minor
 Fix For: 0.10


 While working with the runtime join classes, I saw that many of them still 
 refer to the join as match.
 Since all other parts now consistently refer to join, we should adjust the 
 runtime classes as well. Makes it easier for new contributors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2492) Rename remaining runtime classes from match to join

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2492.
---

 Rename remaining runtime classes from match to join
 ---

 Key: FLINK-2492
 URL: https://issues.apache.org/jira/browse/FLINK-2492
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Minor
 Fix For: 0.10


 While working with the runtime join classes, I saw that many of them still 
 refer to the join as match.
 Since all other parts now consistently refer to join, we should adjust the 
 runtime classes as well. Makes it easier for new contributors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660359#comment-14660359
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128446267
  
The bloom filters are stored in subregions of Flink's memory segments, not 
in any additional memory. 

That is very nice (occupies no extra memory), but requires to go against 
Flink's memory segments, rather than longs or byte arrays. Hence, the custom 
implementation.


 Use BloomFilter to minimize probe side records which are spilled to disk in 
 Hybrid-Hash-Join
 

 Key: FLINK-2240
 URL: https://issues.apache.org/jira/browse/FLINK-2240
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor
 Fix For: 0.10


 In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
 small table data would be spilled to disk, and the counterpart partition of 
 big table data would be spilled to disk in probe phase as well. If we build a 
 BloomFilter while spill small table to disk during build phase, and use it to 
 filter the big table records which tend to be spilled to disk, this may 
 greatly  reduce the spilled big table file size, and saved the disk IO cost 
 for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [WIP][FLINK-2386] Add new Kafka Consumers

2015-08-06 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/996

[WIP][FLINK-2386] Add new Kafka Consumers

I'm opening a WIP pull request (against our rules) to get some feedback on 
my ongoing work.
Please note that I'm on vacation next week (until August 17)

**Why this rework?**

The current `PersistentKafkaSource` does not always provide exactly-once 
processing guarantees because we are using the high level Consumer API of Kafka.
We've chosen to use that API because it is handling all the corner cases 
such as leader election, leader failover and other low level stuff.
The problem is that the API does not allow us to
- commit offsets manually
- consistently (across restarts) assign partitions to Flink instances  

The Kafka community is aware of these issues and actively working on a new 
Consumer API. See 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
 and https://issues.apache.org/jira/browse/KAFKA-1326
The release of Kafka 0.8.3 is scheduled for October 2015 (see plan: 
https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan)

Therefore, I decided for the following approach:
Copy the code of the unreleased, new Kafka Consumer into the Flink consumer 
and use it.
The new API has all the bells and whistles we need (manual committing, 
per-partition subscriptions, nice APIs), but it is not completely backwards 
compatible.

We can retrieve topic metadata with the new API from Kafka 0.8.1, 0.8.2 
(and of course 0.8.3)
We can retrieve data from Kafka 0.8.2 (and 0.8.3)
We can only commit to Kafka 0.8.3

Therefore, this pull request contains three different user facing classes 
`FlinkKafkaConsumer081`,  `FlinkKafkaConsumer082` and `FlinkKafkaConsumer083` 
for the different possible combinations.
For 0.8.1 we are using a hand-crafted implementation against the simple 
consumer API 
(https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example)
 so we had to do what we originally wanted to avoid.
I tried to make that implementation as robust and efficient as possible. 
I'm intentionally not handling any broker failures in the code. For these 
cases, I'm relying on Flink's fault tolerance mechanisms (which effectively 
means redeploying the Kafka sources against other online brokers)

For reviewing the pull request, there are only a few important classes to 
look at:
- FlinkKafkaConsumerBase
- IncludedFetcher
- LegacyFetcher (the one implementing the SimpleConsumer API)
I fixed a little bug in the stream graph generator. It was ignoring the 
number of execution retries when no checkpointing is enabled. 


Known issues:
- this pull request contains at least one failing test
- the KafkaConsumer contains at least one known, yet untested bug
- missing documentation

I will also open a pull request for using the new Producer API. It provides 
much better performance and usability.

Open questions:
- Do we really want to copy 20k+ lines of code into our code base (for 
now)? 
If there are concerns about this, I could also manually implement the 
missing pieces. Its probably 100 lines of code for getting the partition infos 
for a topic, and we would use the Simple Consumer also for reading from 0.8.2.

- Do we want to use the packaging I'm suggesting here (additional maven 
module for `flink-connector-kafka-083`). We would need to introduce it anyways 
when Kafka releases 0.8.3 because the dependencies are not compatible.
But its adding confusion for our users.
I will write more documentation for guidance.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink2386

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/996.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #996


commit 177e0bbc6bc613b67111ba038e0ded4fae8474f1
Author: Robert Metzger rmetz...@apache.org
Date:   2015-07-20T19:39:46Z

wip

commit 70cb8f1ecb7df7a98313796609d2fa0dbade86bf
Author: Robert Metzger rmetz...@apache.org
Date:   2015-07-21T15:21:45Z

[FLINK-2386] Add initial code for the new kafka connector, with everything 
unreleased copied from the kafka sources

commit a4a2847908a8c2f118b8667d7cb66693c065c38d
Author: Robert Metzger rmetz...@apache.org
Date:   2015-07-21T17:58:13Z

wip

commit b02cde37c2120ff6f0fcf1c233391a1d8804e594
Author: Robert Metzger rmetz...@apache.org
Date:   2015-07-22T15:29:58Z

wip

commit 54a05c39d150b016e0a089daedb3492d986b93bd
Author: Robert Metzger rmetz...@apache.org
Date:   2015-07-22T19:56:41Z

wip

commit 393fd6766a5df4bf14ef0c13864b8a4abdb62bb4
Author: Robert Metzger 

[jira] [Resolved] (FLINK-854) Web interface: show runtime of subtasks

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-854.

   Resolution: Won't Fix
Fix Version/s: (was: pre-apache)

Subsumed by effort for the new web interface [FLINK-2357]

 Web interface: show runtime of subtasks
 ---

 Key: FLINK-854
 URL: https://issues.apache.org/jira/browse/FLINK-854
 Project: Flink
  Issue Type: Improvement
  Components: Webfrontend
Reporter: Ufuk Celebi
Priority: Minor
  Labels: github-import

 When I click on the detailed view of a task, I see all subtasks as in the 
 screenshot below. I would also like to show the runtime per stage, e.g. I 
 want to know how long the yellow subtask was in running.
 ![screen shot 2014-05-24 at 18 09 
 36|https://cloud.githubusercontent.com/assets/1756620/3074999/d0ebe1d6-e35d-11e3-9f18-26ac5ce96999.png]
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/854
 Created by: [uce|https://github.com/uce]
 Labels: enhancement, gui, user satisfaction, 
 Milestone: Release 0.6 (unplanned)
 Created at: Sat May 24 18:10:25 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-854) Web interface: show runtime of subtasks

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-854.
--

 Web interface: show runtime of subtasks
 ---

 Key: FLINK-854
 URL: https://issues.apache.org/jira/browse/FLINK-854
 Project: Flink
  Issue Type: Improvement
  Components: Webfrontend
Reporter: Ufuk Celebi
Priority: Minor
  Labels: github-import

 When I click on the detailed view of a task, I see all subtasks as in the 
 screenshot below. I would also like to show the runtime per stage, e.g. I 
 want to know how long the yellow subtask was in running.
 ![screen shot 2014-05-24 at 18 09 
 36|https://cloud.githubusercontent.com/assets/1756620/3074999/d0ebe1d6-e35d-11e3-9f18-26ac5ce96999.png]
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/854
 Created by: [uce|https://github.com/uce]
 Labels: enhancement, gui, user satisfaction, 
 Milestone: Release 0.6 (unplanned)
 Created at: Sat May 24 18:10:25 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-871) Create a documentation distribution together with other release artifacts

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-871.

   Resolution: Fixed
Fix Version/s: (was: pre-apache)

Documentation is part of the source code, and the Apache CI infrastructure 
builds and updates snapshot documentation on a nightly basis.

 Create a documentation distribution together with other release artifacts
 -

 Key: FLINK-871
 URL: https://issues.apache.org/jira/browse/FLINK-871
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
Priority: Minor
  Labels: github-import

 It would be good to have a documentation distribution together with the other 
 release artifacts. We can use markdown files, .md, for documentation and use 
 maven md plugin for managements. The same documentation can be used for the 
 web site etc.. 
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/871
 Created by: [danrk|https://github.com/danrk]
 Labels: documentation, enhancement, 
 Created at: Tue May 27 02:44:53 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1013) ArithmeticException: / by zero in MutableHashTable

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1013.
-
   Resolution: Fixed
 Assignee: Stephan Ewen
Fix Version/s: 0.9

Fixed already a while back by delegating computation to existing utility 
function that is also used to build the initial table.

 ArithmeticException: / by zero in MutableHashTable
 --

 Key: FLINK-1013
 URL: https://issues.apache.org/jira/browse/FLINK-1013
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Assignee: Stephan Ewen
 Fix For: 0.9


 I encountered a division by zero exception in the MutableHashTable. It 
 happened when I joined two datasets of relatively big records (approx. 40-50 
 MB I think). When joining them the buildTableFromSpilledPartition method of 
 the MutableHashTable is called. In case that the available buffers are 
 smaller than the needed number of buffers, the mutable hash table will 
 calculate the bucket count
 {code}
 bucketCount = (int) (((long) totalBuffersAvailable) * RECORD_TABLE_BYTES / 
 (avgRecordLenPartition + RECORD_OVERHEAD_BYTES));
 {code}
 If the average record length is sufficiently large, then the bucket count 
 will be 0. Initializing the hash table with a 0 bucket count will cause then 
 the division by 0 exception. I don't know whether this problem can be 
 mitigated but it should at least throw a meaningful exception instead of the 
 ArithmeticException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1013) ArithmeticException: / by zero in MutableHashTable

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1013.
---

 ArithmeticException: / by zero in MutableHashTable
 --

 Key: FLINK-1013
 URL: https://issues.apache.org/jira/browse/FLINK-1013
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Assignee: Stephan Ewen
 Fix For: 0.9


 I encountered a division by zero exception in the MutableHashTable. It 
 happened when I joined two datasets of relatively big records (approx. 40-50 
 MB I think). When joining them the buildTableFromSpilledPartition method of 
 the MutableHashTable is called. In case that the available buffers are 
 smaller than the needed number of buffers, the mutable hash table will 
 calculate the bucket count
 {code}
 bucketCount = (int) (((long) totalBuffersAvailable) * RECORD_TABLE_BYTES / 
 (avgRecordLenPartition + RECORD_OVERHEAD_BYTES));
 {code}
 If the average record length is sufficiently large, then the bucket count 
 will be 0. Initializing the hash table with a 0 bucket count will cause then 
 the division by 0 exception. I don't know whether this problem can be 
 mitigated but it should at least throw a meaningful exception instead of the 
 ArithmeticException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-509) Move virtual machines building process / hosting to Amazon S3 / EC2

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-509.

   Resolution: Invalid
Fix Version/s: (was: pre-apache)

No virtual machine images are built by the Flink Apache infrastructure. Docker 
support has been added.

 Move virtual machines building process / hosting to Amazon S3 / EC2
 ---

 Key: FLINK-509
 URL: https://issues.apache.org/jira/browse/FLINK-509
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
Priority: Minor
  Labels: github-import

 The virtual machine images are currently hosted at on a very unreliable 
 server.
 I'd be happy if someone could come up with a more cost-efficient solution 
 than Amazon. But we need a reliable solution.
 * The hosting is unreliable
 * the automated build process stopped for some reason
 * there is no VM for the 0.4 release
 * there is no? error reporting
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/509
 Created by: [rmetzger|https://github.com/rmetzger]
 Labels: bug, build system, website, 
 Milestone: Release 0.6 (unplanned)
 Created at: Wed Feb 26 10:49:33 CET 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-509) Move virtual machines building process / hosting to Amazon S3 / EC2

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-509.
--

 Move virtual machines building process / hosting to Amazon S3 / EC2
 ---

 Key: FLINK-509
 URL: https://issues.apache.org/jira/browse/FLINK-509
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
Priority: Minor
  Labels: github-import

 The virtual machine images are currently hosted at on a very unreliable 
 server.
 I'd be happy if someone could come up with a more cost-efficient solution 
 than Amazon. But we need a reliable solution.
 * The hosting is unreliable
 * the automated build process stopped for some reason
 * there is no VM for the 0.4 release
 * there is no? error reporting
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/509
 Created by: [rmetzger|https://github.com/rmetzger]
 Labels: bug, build system, website, 
 Milestone: Release 0.6 (unplanned)
 Created at: Wed Feb 26 10:49:33 CET 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660331#comment-14660331
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128441993
  
This was a super cool contribution. A pretty sophisticated addition, super 
testing, high code quality.
I am very impressed!

I hope you will contribute more to Flink. Already saw that you opened 
another pull request, for a sampling operator. Happy that this is happening :-)

In the future, I can hopefully review and merge the pull requests faster. 
The past weeks, I did not get to code work as much as I wanted, and the list of 
critical issues was long, so this pull request got delayed a bit.


 Use BloomFilter to minimize probe side records which are spilled to disk in 
 Hybrid-Hash-Join
 

 Key: FLINK-2240
 URL: https://issues.apache.org/jira/browse/FLINK-2240
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor

 In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
 small table data would be spilled to disk, and the counterpart partition of 
 big table data would be spilled to disk in probe phase as well. If we build a 
 BloomFilter while spill small table to disk during build phase, and use it to 
 filter the big table records which tend to be spilled to disk, this may 
 greatly  reduce the spilled big table file size, and saved the disk IO cost 
 for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2452) Add a playcount threshold to the MusicProfiles example

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660454#comment-14660454
 ] 

ASF GitHub Bot commented on FLINK-2452:
---

Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/968#issuecomment-128456801
  
Nope, no comment. 


 Add a playcount threshold to the MusicProfiles example
 --

 Key: FLINK-2452
 URL: https://issues.apache.org/jira/browse/FLINK-2452
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.10
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri
Priority: Minor

 In the MusicProfiles example, when creating the user-user similarity graph, 
 an edge is created between any 2 users that have listened to the same song 
 (even if once). Depending on the input data, this might produce a projection 
 graph with many more edges than the original user-song graph.
 To make this computation more efficient, this issue proposes adding a 
 user-defined parameter that filters out songs that a user has listened to 
 only a few times. Essentially, it is a threshold for playcount, above which a 
 user is considered to like a song.
 For reference, with a threshold value of 30, the whole Last.fm dataset is 
 analyzed on my laptop in a few minutes, while no threshold results in a 
 runtime of several hours.
 There are many solutions to this problem, but since this is just an example 
 (not a library method), I think that keeping it simple is important.
 Thanks to [~andralungu] for spotting the inefficiency!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-848) Move combine() from GroupReduceFunction to Interface

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-848.
--

 Move combine() from GroupReduceFunction to Interface
 

 Key: FLINK-848
 URL: https://issues.apache.org/jira/browse/FLINK-848
 Project: Flink
  Issue Type: Sub-task
  Components: Java API
Reporter: Fabian Hueske
Assignee: Kostas Tzoumas
  Labels: breaking-api, github-import
 Fix For: 0.7.1-incubating


 Currently, the combine method of the GroupReduceFunction allows to return 
 multiple values using a collector. However, most combiners do not need this 
 because they return only a single value. Furthermore, a single value 
 returning combiner can be executed using more efficient hash-based strategies.
 Hence, we propose to introduce a combine method for GroupReduce which returns 
 only a single value. In order to keep support for the rare cases where more 
 than one value needs to be returned, we want to keep the collector-combiner 
 as well.
 To do so, we could remove the combine method from the abstract 
 GroupReduceFunction class and add two Combinable interfaces, one for a 
 single-value and one for a multi-value combiner.
 This would also make the Combinable annotation obsolete as the optimizer can 
 check whether a GroupReduceFunction implements one of the Combinable 
 interfaces or not.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/848
 Created by: [fhueske|https://github.com/fhueske]
 Labels: core, enhancement, 
 Milestone: Release 0.6 (unplanned)
 Created at: Thu May 22 10:23:04 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2452] [Gelly] adds a playcount threshol...

2015-08-06 Thread andralungu
Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/968#issuecomment-128456801
  
Nope, no comment. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-848) Move combine() from GroupReduceFunction to Interface

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-848.

   Resolution: Fixed
Fix Version/s: (was: pre-apache)
   0.7.1-incubating

Has been moved to interfaces {{CombineFunction}} and {{GroupCombineFunction}}

 Move combine() from GroupReduceFunction to Interface
 

 Key: FLINK-848
 URL: https://issues.apache.org/jira/browse/FLINK-848
 Project: Flink
  Issue Type: Sub-task
  Components: Java API
Reporter: Fabian Hueske
Assignee: Kostas Tzoumas
  Labels: breaking-api, github-import
 Fix For: 0.7.1-incubating


 Currently, the combine method of the GroupReduceFunction allows to return 
 multiple values using a collector. However, most combiners do not need this 
 because they return only a single value. Furthermore, a single value 
 returning combiner can be executed using more efficient hash-based strategies.
 Hence, we propose to introduce a combine method for GroupReduce which returns 
 only a single value. In order to keep support for the rare cases where more 
 than one value needs to be returned, we want to keep the collector-combiner 
 as well.
 To do so, we could remove the combine method from the abstract 
 GroupReduceFunction class and add two Combinable interfaces, one for a 
 single-value and one for a multi-value combiner.
 This would also make the Combinable annotation obsolete as the optimizer can 
 check whether a GroupReduceFunction implements one of the Combinable 
 interfaces or not.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/848
 Created by: [fhueske|https://github.com/fhueske]
 Labels: core, enhancement, 
 Milestone: Release 0.6 (unplanned)
 Created at: Thu May 22 10:23:04 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-701) Change new Java API functions to SAMs

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-701.
--

 Change new Java API functions to SAMs
 -

 Key: FLINK-701
 URL: https://issues.apache.org/jira/browse/FLINK-701
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Reporter: GitHub Import
Assignee: Kostas Tzoumas
  Labels: github-import
 Fix For: 0.6-incubating


 In order to support a compact syntax with Java 8 Lambdas, we would need to 
 change the types of the functions to Single Abstract Method types (SAMs). 
 Only those can be implemented by Lambdas.
 That means that DataSet.map(MapFunction) would accept an interface 
 MapFunction, not the abstract class that we use now. Many UDFs would not 
 inherit form `AbstractFunction` any more. The inheritance from 
 AbstractFunction would be optional, if life cycle methods (open / close) and 
 runtime contexts are needed.
 This may have also implications on the type extraction, as the generic 
 parameters are in generic superinterfaces, rather than in generic 
 superclasses.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/701
 Created by: [StephanEwen|https://github.com/StephanEwen]
 Labels: enhancement, java api, user satisfaction, 
 Milestone: Release 0.6 (unplanned)
 Created at: Thu Apr 17 13:06:40 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-967) Make intermediate results a first-class citizen in the JobGraph

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-967.
--

 Make intermediate results a first-class citizen in the JobGraph
 ---

 Key: FLINK-967
 URL: https://issues.apache.org/jira/browse/FLINK-967
 Project: Flink
  Issue Type: New Feature
  Components: JobManager, TaskManager
Affects Versions: 0.6-incubating
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 In order to add incremental plan rollout to the system, we need to make 
 intermediate results a first-class citizen in the job graph and scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-981) Support for generated Cloudera Hadoop configuration

2015-08-06 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660484#comment-14660484
 ] 

Stephan Ewen commented on FLINK-981:


Is this still valid, or is this fixed already by properly loading the Hadoop 
configuration?

 Support for generated Cloudera Hadoop configuration 
 

 Key: FLINK-981
 URL: https://issues.apache.org/jira/browse/FLINK-981
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime, YARN Client
Reporter: Robert Metzger

 Cloudera Hadoop generates configuration files that different from the vanilla 
 upstream Hadoop configuration files.
 The HDFS and the YARN component both access configuration values from Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1116) Packaged Scala Examples do not work due to missing test data

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1116.
-
Resolution: Invalid

Scala examples are not packaged any more (redundancy with java examples)

 Packaged Scala Examples do not work due to missing test data
 

 Key: FLINK-1116
 URL: https://issues.apache.org/jira/browse/FLINK-1116
 Project: Flink
  Issue Type: Bug
  Components: Scala API
Reporter: Stephan Ewen
Priority: Minor

 The example data classes are in the java examples project. The maven jar 
 plugin cannot include them into the jars of the Scala examples, causing the 
 examples to fail with a ClassNotFoundException when staring the example job.
 For now, I disabled the Scala example jars from being built, because they do 
 not work anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1116) Packaged Scala Examples do not work due to missing test data

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1116.
---

 Packaged Scala Examples do not work due to missing test data
 

 Key: FLINK-1116
 URL: https://issues.apache.org/jira/browse/FLINK-1116
 Project: Flink
  Issue Type: Bug
  Components: Scala API
Reporter: Stephan Ewen
Priority: Minor

 The example data classes are in the java examples project. The maven jar 
 plugin cannot include them into the jars of the Scala examples, causing the 
 examples to fail with a ClassNotFoundException when staring the example job.
 For now, I disabled the Scala example jars from being built, because they do 
 not work anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1083) WebInterface improvements

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1083.
---

 WebInterface improvements
 -

 Key: FLINK-1083
 URL: https://issues.apache.org/jira/browse/FLINK-1083
 Project: Flink
  Issue Type: Improvement
  Components: JobManager
Reporter: Jonathan Hasenburg

 New Issue to summarize all things that should be done regarding the 
 webinterface.
 * rework dashboard in a way that more than one job can be shown ... . If a 
 job is clicked you get to the details.
 * DONE: add history to dashboard
 * Running jobs should get a view like jobs in the history.
 * DONE: rework the menu and try to add some links to the dashboard (like the 
 taskmanager section)
 * improve the way the jsons are send to the webinterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1135) Blog post with topic Accessing Data Stored in Hive with Flink

2015-08-06 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660516#comment-14660516
 ] 

Stephan Ewen commented on FLINK-1135:
-

I this still going to happen?

 Blog post with topic Accessing Data Stored in Hive with Flink
 ---

 Key: FLINK-1135
 URL: https://issues.apache.org/jira/browse/FLINK-1135
 Project: Flink
  Issue Type: Improvement
  Components: Project Website
Reporter: Timo Walther
Assignee: Robert Metzger
Priority: Minor
 Attachments: 2014-09-29-querying-hive.md


 Recently, I implemented a Flink job that accessed Hive. Maybe someone else is 
 going to try this. I created a blog post for the website to share my 
 experience.
 You'll find the blog post file attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2240.
-
   Resolution: Implemented
Fix Version/s: 0.10

Implemented in 61dcae391cb3b45ba3aff47d4d9163889d2958a4

Thank you for the contribution!

 Use BloomFilter to minimize probe side records which are spilled to disk in 
 Hybrid-Hash-Join
 

 Key: FLINK-2240
 URL: https://issues.apache.org/jira/browse/FLINK-2240
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor
 Fix For: 0.10


 In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
 small table data would be spilled to disk, and the counterpart partition of 
 big table data would be spilled to disk in probe phase as well. If we build a 
 BloomFilter while spill small table to disk during build phase, and use it to 
 filter the big table records which tend to be spilled to disk, this may 
 greatly  reduce the spilled big table file size, and saved the disk IO cost 
 for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2240.
---

 Use BloomFilter to minimize probe side records which are spilled to disk in 
 Hybrid-Hash-Join
 

 Key: FLINK-2240
 URL: https://issues.apache.org/jira/browse/FLINK-2240
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor
 Fix For: 0.10


 In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
 small table data would be spilled to disk, and the counterpart partition of 
 big table data would be spilled to disk in probe phase as well. If we build a 
 BloomFilter while spill small table to disk during build phase, and use it to 
 filter the big table records which tend to be spilled to disk, this may 
 greatly  reduce the spilled big table file size, and saved the disk IO cost 
 for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >