[jira] [Commented] (FLINK-2398) Decouple StreamGraph Building from the API
[ https://issues.apache.org/jira/browse/FLINK-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660792#comment-14660792 ] ASF GitHub Bot commented on FLINK-2398: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/988#issuecomment-128507574 Yes, I think the automatic rebalance is good. My approach of throwing the exception was just the easiest way of dealing with the previously faulty behavior. I think people coming from storm are also used to just having operations that are executed even if you don't have a sink. So maybe we should keep that. Decouple StreamGraph Building from the API -- Key: FLINK-2398 URL: https://issues.apache.org/jira/browse/FLINK-2398 Project: Flink Issue Type: Improvement Components: Streaming Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Currently, the building of the StreamGraph is very intertwined with the API methods. DataStream knows about the StreamGraph and keeps track of splitting, selected names, unions and so on. This leads to the problem that is is very hard to understand how the StreamGraph is built because the code that does it is all over the place. This also makes it hard to extend/change parts of the Streaming system. I propose to introduce Transformations. A transformation hold information about one operation: The input streams, types, names, operator and so on. An API method creates a transformation instead of fiddling with the StreamGraph directly. A new component, the StreamGraphGenerator creates a StreamGraph from the tree of transformations that result from program specification using the API methods. This would relieve DataStream from knowing about the StreamGraph and makes unions, splitting, selection visible transformations instead of being scattered across the different API classes as fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2314]
GitHub user sheetalparade opened a pull request: https://github.com/apache/flink/pull/997 [FLINK-2314] [FLINK-2314] - Added checkpointing feature into File Source You can merge this pull request into a Git repository by running: $ git pull https://github.com/sheetalparade/flink master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/997.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #997 commit c32d9bfada1f6b601c221185a08c8b1c5a454a5c Author: Sheetal Parade sheetal.parade+git...@gmail.com Date: 2015-08-06T16:29:19Z [FLINK-2314] Added checkpointing feature into File Source [FLINK-2314] Added checkpointing feature into File Source --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent
[ https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660703#comment-14660703 ] ASF GitHub Bot commented on FLINK-2314: --- GitHub user sheetalparade opened a pull request: https://github.com/apache/flink/pull/997 [FLINK-2314] [FLINK-2314] - Added checkpointing feature into File Source You can merge this pull request into a Git repository by running: $ git pull https://github.com/sheetalparade/flink master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/997.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #997 commit c32d9bfada1f6b601c221185a08c8b1c5a454a5c Author: Sheetal Parade sheetal.parade+git...@gmail.com Date: 2015-08-06T16:29:19Z [FLINK-2314] Added checkpointing feature into File Source [FLINK-2314] Added checkpointing feature into File Source Make Streaming File Sources Persistent -- Key: FLINK-2314 URL: https://issues.apache.org/jira/browse/FLINK-2314 Project: Flink Issue Type: Improvement Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Sheetal Parade Labels: easyfix, starter Streaming File sources should participate in the checkpointing. They should track the bytes they read from the file and checkpoint it. One can look at the sequence generating source function for an example of a checkpointed source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1138) Allow users to specify methods instead of fields in key expressions
[ https://issues.apache.org/jira/browse/FLINK-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660700#comment-14660700 ] Fabian Hueske commented on FLINK-1138: -- +1 for won't fix Allow users to specify methods instead of fields in key expressions --- Key: FLINK-1138 URL: https://issues.apache.org/jira/browse/FLINK-1138 Project: Flink Issue Type: Improvement Components: Java API Reporter: Robert Metzger Priority: Minor Currently, users can specify grouping fields only on the fields of a POJO. It would be nice to allow users also to name a method (such as getVertexId()) to be called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2491) Operators are not participating in state checkpointing in some cases
Robert Metzger created FLINK-2491: - Summary: Operators are not participating in state checkpointing in some cases Key: FLINK-2491 URL: https://issues.apache.org/jira/browse/FLINK-2491 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Robert Metzger Priority: Critical While implementing a test case for the Kafka Consumer, I came across the following bug: Consider the following topology, with the operator parallelism in parentheses: Source (2) -- Sink (1). In this setup, the {{snapshotState()}} method is called on the source, but not on the Sink. The sink receives the generated data. The only one of the two sources is generating data. I've implemented a test case for this, you can find it here: https://github.com/rmetzger/flink/tree/para_checkpoint_bug -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2490) Remove unwanted boolean check in function SocketTextStreamFunction.streamFromSocket
[ https://issues.apache.org/jira/browse/FLINK-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659721#comment-14659721 ] ASF GitHub Bot commented on FLINK-2490: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/992#issuecomment-128301542 if you remove that check, retryForever is unused and can be removed completely. Remove unwanted boolean check in function SocketTextStreamFunction.streamFromSocket --- Key: FLINK-2490 URL: https://issues.apache.org/jira/browse/FLINK-2490 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Huang Wei Priority: Minor Fix For: 0.10 Original Estimate: 168h Remaining Estimate: 168h -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2432] Custom serializer support
Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/962#discussion_r36395811 --- Diff: flink-staging/flink-language-binding/flink-python/src/test/python/org/apache/flink/languagebinding/api/python/flink/test/test_custom.py --- @@ -0,0 +1,76 @@ +# ### --- End diff -- Actually, I'm just gonna move this code into the test_main file, I was going to do that in my next test-centric PR anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2432) [py] Provide support for custom serialization
[ https://issues.apache.org/jira/browse/FLINK-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659730#comment-14659730 ] ASF GitHub Bot commented on FLINK-2432: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/962#discussion_r36395811 --- Diff: flink-staging/flink-language-binding/flink-python/src/test/python/org/apache/flink/languagebinding/api/python/flink/test/test_custom.py --- @@ -0,0 +1,76 @@ +# ### --- End diff -- Actually, I'm just gonna move this code into the test_main file, I was going to do that in my next test-centric PR anyway. [py] Provide support for custom serialization - Key: FLINK-2432 URL: https://issues.apache.org/jira/browse/FLINK-2432 Project: Flink Issue Type: New Feature Components: Python API Reporter: Chesnay Schepler Assignee: Chesnay Schepler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2490][FIX]Remove the retryForever check...
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/992#issuecomment-128309202 If you think it was necessary why was your first step to remove it's usage... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2490) Remove unwanted boolean check in function SocketTextStreamFunction.streamFromSocket
[ https://issues.apache.org/jira/browse/FLINK-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659747#comment-14659747 ] ASF GitHub Bot commented on FLINK-2490: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/992#issuecomment-128309202 If you think it was necessary why was your first step to remove it's usage... Remove unwanted boolean check in function SocketTextStreamFunction.streamFromSocket --- Key: FLINK-2490 URL: https://issues.apache.org/jira/browse/FLINK-2490 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Huang Wei Priority: Minor Fix For: 0.10 Original Estimate: 168h Remaining Estimate: 168h -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2490) Remove unwanted boolean check in function SocketTextStreamFunction.streamFromSocket
[ https://issues.apache.org/jira/browse/FLINK-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659821#comment-14659821 ] ASF GitHub Bot commented on FLINK-2490: --- Github user HuangWHWHW commented on the pull request: https://github.com/apache/flink/pull/992#issuecomment-128322151 Hah Sorry, this thought was generated after this PR. Remove unwanted boolean check in function SocketTextStreamFunction.streamFromSocket --- Key: FLINK-2490 URL: https://issues.apache.org/jira/browse/FLINK-2490 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Huang Wei Priority: Minor Fix For: 0.10 Original Estimate: 168h Remaining Estimate: 168h -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2490][FIX]Remove the retryForever check...
Github user HuangWHWHW commented on the pull request: https://github.com/apache/flink/pull/992#issuecomment-128322151 Hah Sorry, this thought was generated after this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2494) Fix StreamGraph getJobGraph bug
fangfengbin created FLINK-2494: -- Summary: Fix StreamGraph getJobGraph bug Key: FLINK-2494 URL: https://issues.apache.org/jira/browse/FLINK-2494 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.8.1 Reporter: fangfengbin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2494) Fix StreamGraph getJobGraph bug
[ https://issues.apache.org/jira/browse/FLINK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661372#comment-14661372 ] ASF GitHub Bot commented on FLINK-2494: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/998#issuecomment-128605743 i would assume that forceCheckpoint is supposed to do exactly that, enforce checkpointing regardless of its support. this change also means that if checkPointint is enabled, but not forced, the job will not hit an UnsupportedOperationException, which doesn't make any sense whatsoever. -1 Fix StreamGraph getJobGraph bug --- Key: FLINK-2494 URL: https://issues.apache.org/jira/browse/FLINK-2494 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.8.1 Reporter: fangfengbin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2494 ]Fix StreamGraph getJobGraph bug
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/998#issuecomment-128605743 i would assume that forceCheckpoint is supposed to do exactly that, enforce checkpointing regardless of its support. this change also means that if checkPointint is enabled, but not forced, the job will not hit an UnsupportedOperationException, which doesn't make any sense whatsoever. -1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Fix StreamGraph getJobGraph bug
GitHub user ffbin opened a pull request: https://github.com/apache/flink/pull/998 Fix StreamGraph getJobGraph bug When forceCheckpoint is true,checkpointing will be enabled for iterative jobs.But now temporarily forbid checkpointing for iterative jobs, so if forceCheckpoint is true, will throw UnsupportedOperationException. The old code logic is reversed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ffbin/flink FLINK-2494 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/998.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #998 commit 226354ccf3060e2d0c2ba4dd607bf83ce02735c1 Author: ffbin 869218...@qq.com Date: 2015-08-07T02:41:55Z Fix StreamGraph getJobGraph bug --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661258#comment-14661258 ] ASF GitHub Bot commented on FLINK-1901: --- Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/949#issuecomment-128580607 Hi, @tillrohrmann , current implementation of sample with fixed size would generate fixed size sample for each partition randomly instead of the whole dataset, user may expect the later one actually most of the time. I'm research on how to sample fixed size elements randomly from distributed data stream, i think we can pause this PR review until i merge the previous fix. Create sample operator for Dataset -- Key: FLINK-1901 URL: https://issues.apache.org/jira/browse/FLINK-1901 Project: Flink Issue Type: Improvement Components: Core Reporter: Theodore Vasiloudis Assignee: Chengxiang Li In order to be able to implement Stochastic Gradient Descent and a number of other machine learning algorithms we need to have a way to take a random sample from a Dataset. We need to be able to sample with or without replacement from the Dataset, choose the relative size of the sample, and set a seed for reproducibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...
Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/949#issuecomment-128580607 Hi, @tillrohrmann , current implementation of sample with fixed size would generate fixed size sample for each partition randomly instead of the whole dataset, user may expect the later one actually most of the time. I'm research on how to sample fixed size elements randomly from distributed data stream, i think we can pause this PR review until i merge the previous fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
[ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661203#comment-14661203 ] ASF GitHub Bot commented on FLINK-2240: --- Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-128563750 Thanks for the review, @StephanEwen , i'm very interesting in this project, and i would like to contribute more. @vasia , I think stephan has helped to answer the question yet, the most important reason is that i want to reuse the memory occupied by hash table buckets. Besides, since this is a performance sense issue, i try to make this bloom filter as much simple and efficient as i can, for example, the hashcode of join key is already generated and stored in hybrid hash join, i just reuse the hashcode instead of generate it by join key value inside bloom filter again. Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join Key: FLINK-2240 URL: https://issues.apache.org/jira/browse/FLINK-2240 Project: Flink Issue Type: Improvement Components: Core Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Minor Fix For: 0.10 In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data would be spilled to disk, and the counterpart partition of big table data would be spilled to disk in probe phase as well. If we build a BloomFilter while spill small table to disk during build phase, and use it to filter the big table records which tend to be spilled to disk, this may greatly reduce the spilled big table file size, and saved the disk IO cost for writing and further reading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
[ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661204#comment-14661204 ] ASF GitHub Bot commented on FLINK-2240: --- Github user ChengXiangLi closed the pull request at: https://github.com/apache/flink/pull/888 Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join Key: FLINK-2240 URL: https://issues.apache.org/jira/browse/FLINK-2240 Project: Flink Issue Type: Improvement Components: Core Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Minor Fix For: 0.10 In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data would be spilled to disk, and the counterpart partition of big table data would be spilled to disk in probe phase as well. If we build a BloomFilter while spill small table to disk during build phase, and use it to filter the big table records which tend to be spilled to disk, this may greatly reduce the spilled big table file size, and saved the disk IO cost for writing and further reading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...
Github user ChengXiangLi closed the pull request at: https://github.com/apache/flink/pull/888 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...
Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-128563750 Thanks for the review, @StephanEwen , i'm very interesting in this project, and i would like to contribute more. @vasia , I think stephan has helped to answer the question yet, the most important reason is that i want to reuse the memory occupied by hash table buckets. Besides, since this is a performance sense issue, i try to make this bloom filter as much simple and efficient as i can, for example, the hashcode of join key is already generated and stored in hybrid hash join, i just reuse the hashcode instead of generate it by join key value inside bloom filter again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent
[ https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660983#comment-14660983 ] ASF GitHub Bot commented on FLINK-2314: --- Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/997#discussion_r36478648 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/FileSourceFunction.java --- @@ -119,13 +124,20 @@ public void run(SourceContextOUT ctx) throws Exception { while (isRunning) { OUT nextElement = serializer.createInstance(); nextElement = format.nextRecord(nextElement); - if (nextElement == null splitIterator.hasNext()) { + if (nextElement == null splitIterator.hasNext() ) { --- End diff -- unnecessary space Make Streaming File Sources Persistent -- Key: FLINK-2314 URL: https://issues.apache.org/jira/browse/FLINK-2314 Project: Flink Issue Type: Improvement Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Sheetal Parade Labels: easyfix, starter Streaming File sources should participate in the checkpointing. They should track the bytes they read from the file and checkpoint it. One can look at the sequence generating source function for an example of a checkpointed source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2314]
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/997#discussion_r36478648 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/FileSourceFunction.java --- @@ -119,13 +124,20 @@ public void run(SourceContextOUT ctx) throws Exception { while (isRunning) { OUT nextElement = serializer.createInstance(); nextElement = format.nextRecord(nextElement); - if (nextElement == null splitIterator.hasNext()) { + if (nextElement == null splitIterator.hasNext() ) { --- End diff -- unnecessary space --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-2464) BufferSpillerTest sometimes fails
[ https://issues.apache.org/jira/browse/FLINK-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2464. - Resolution: Duplicate Problem is tracked in [FLINK-2466] BufferSpillerTest sometimes fails - Key: FLINK-2464 URL: https://issues.apache.org/jira/browse/FLINK-2464 Project: Flink Issue Type: Bug Components: Streaming Reporter: Gyula Fora Assignee: Stephan Ewen Priority: Minor Fix For: 0.10 The BufferSpillerTest failed with the following error: org.apache.flink.streaming.runtime.io.BufferSpillerTest testSpillWhileReading(org.apache.flink.streaming.runtime.io.BufferSpillerTest) Time elapsed: 3.28 sec FAILURE! java.lang.AssertionError: wrong buffer contents expected:0 but was:58 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.flink.streaming.runtime.io.BufferSpillerTest.validateBuffer(BufferSpillerTest.java:290) at org.apache.flink.streaming.runtime.io.BufferSpillerTest.access$200(BufferSpillerTest.java:42) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2464) BufferSpillerTest sometimes fails
[ https://issues.apache.org/jira/browse/FLINK-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2464. --- BufferSpillerTest sometimes fails - Key: FLINK-2464 URL: https://issues.apache.org/jira/browse/FLINK-2464 Project: Flink Issue Type: Bug Components: Streaming Reporter: Gyula Fora Assignee: Stephan Ewen Priority: Minor Fix For: 0.10 The BufferSpillerTest failed with the following error: org.apache.flink.streaming.runtime.io.BufferSpillerTest testSpillWhileReading(org.apache.flink.streaming.runtime.io.BufferSpillerTest) Time elapsed: 3.28 sec FAILURE! java.lang.AssertionError: wrong buffer contents expected:0 but was:58 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.flink.streaming.runtime.io.BufferSpillerTest.validateBuffer(BufferSpillerTest.java:290) at org.apache.flink.streaming.runtime.io.BufferSpillerTest.access$200(BufferSpillerTest.java:42) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2466) Travis build failure
[ https://issues.apache.org/jira/browse/FLINK-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659928#comment-14659928 ] Stephan Ewen commented on FLINK-2466: - If someone could double check whether they see code that can crash the JVM in {{BarrierBufferMassiveRandomTest}}, {{BarrierBuffer}}, and {{BufferSpiller}}, I would appreciate that. Otherwise, I would suggest to go forward with deprecating Java 6 support and close this on strong suspicion that it is a Java 6 issue for the following reasons: - There is no code in the tested classes that can actually crash a JVM. - The problem has occurred more than 50 times for Java 6, but not a single time for Java 7 and 8. Travis build failure Key: FLINK-2466 URL: https://issues.apache.org/jira/browse/FLINK-2466 Project: Flink Issue Type: Bug Reporter: Sachin Goel Assignee: Stephan Ewen Fix For: 0.10 One of my builds failed with Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on project flink-streaming-core: ExecutionException: java.lang.RuntimeException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called? Here's the build: https://travis-ci.org/apache/flink/jobs/73851870 This looks like a travis hiccup though, similar to how tachyon fails sometimes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-2466) Travis build failure
[ https://issues.apache.org/jira/browse/FLINK-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659928#comment-14659928 ] Stephan Ewen edited comment on FLINK-2466 at 8/6/15 12:29 PM: -- If someone could double check whether they see code that can crash the JVM in {{BarrierBufferMassiveRandomTest}}, {{BarrierBuffer}}, and {{BufferSpiller}}, I would appreciate that. Otherwise, I would suggest to go forward with deprecating Java 6 support and close this on strong suspicion that it is a Java 6 issue for the following reasons: - There is no code in the tested classes that can actually crash a JVM (unless there is a JVM bug) - The problem has occurred more than 50 times for Java 6, but not a single time for Java 7 and 8. was (Author: stephanewen): If someone could double check whether they see code that can crash the JVM in {{BarrierBufferMassiveRandomTest}}, {{BarrierBuffer}}, and {{BufferSpiller}}, I would appreciate that. Otherwise, I would suggest to go forward with deprecating Java 6 support and close this on strong suspicion that it is a Java 6 issue for the following reasons: - There is no code in the tested classes that can actually crash a JVM. - The problem has occurred more than 50 times for Java 6, but not a single time for Java 7 and 8. Travis build failure Key: FLINK-2466 URL: https://issues.apache.org/jira/browse/FLINK-2466 Project: Flink Issue Type: Bug Reporter: Sachin Goel Assignee: Stephan Ewen Fix For: 0.10 One of my builds failed with Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on project flink-streaming-core: ExecutionException: java.lang.RuntimeException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called? Here's the build: https://travis-ci.org/apache/flink/jobs/73851870 This looks like a travis hiccup though, similar to how tachyon fails sometimes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2457) Integrate Tuple0
[ https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659970#comment-14659970 ] ASF GitHub Bot commented on FLINK-2457: --- Github user mjsax commented on a diff in the pull request: https://github.com/apache/flink/pull/983#discussion_r36410627 --- Diff: flink-staging/flink-language-binding/flink-language-binding-generic/src/main/java/org/apache/flink/languagebinding/api/java/common/streaming/Receiver.java --- @@ -346,61 +349,12 @@ public Tuple deserialize() { } public static Tuple createTuple(int size) { - switch (size) { - case 0: - return new Tuple0(); - case 1: - return new Tuple1(); - case 2: - return new Tuple2(); - case 3: - return new Tuple3(); - case 4: - return new Tuple4(); - case 5: - return new Tuple5(); - case 6: - return new Tuple6(); - case 7: - return new Tuple7(); - case 8: - return new Tuple8(); - case 9: - return new Tuple9(); - case 10: - return new Tuple10(); - case 11: - return new Tuple11(); - case 12: - return new Tuple12(); - case 13: - return new Tuple13(); - case 14: - return new Tuple14(); - case 15: - return new Tuple15(); - case 16: - return new Tuple16(); - case 17: - return new Tuple17(); - case 18: - return new Tuple18(); - case 19: - return new Tuple19(); - case 20: - return new Tuple20(); - case 21: - return new Tuple21(); - case 22: - return new Tuple22(); - case 23: - return new Tuple23(); - case 24: - return new Tuple24(); - case 25: - return new Tuple25(); - default: - throw new IllegalArgumentException(Tuple size not supported: + size); + try { + return Tuple.getTupleClass(size).newInstance(); --- End diff -- No need for this. `.getTupleClass()` does the check already. Integrate Tuple0 Key: FLINK-2457 URL: https://issues.apache.org/jira/browse/FLINK-2457 Project: Flink Issue Type: Improvement Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Tuple0 is not cleanly integrated: - missing serialization/deserialization support in runtime - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create an instance of Tuple0 Tuple0 is currently only used in Python API, but will be integrated into Storm compatibility, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2466) Travis build failure
[ https://issues.apache.org/jira/browse/FLINK-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659904#comment-14659904 ] Stephan Ewen commented on FLINK-2466: - It is the BarrierBufferMassiveRandomTest again. The Killed message means the JVM crashes. Not sure why that happens, we are not using any unsafe stuff in that test. While fixing the BufferSpillerTest, I noticed that there is a possible Java 6 bug when handing FileChannels/DirectByteBuffers over between threads. For a JVM to crash without the user code using Unsafe, I have a hard time imagining what it could be, other than a JVM issue. Travis build failure Key: FLINK-2466 URL: https://issues.apache.org/jira/browse/FLINK-2466 Project: Flink Issue Type: Bug Reporter: Sachin Goel Assignee: Stephan Ewen Fix For: 0.10 One of my builds failed with Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on project flink-streaming-core: ExecutionException: java.lang.RuntimeException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called? Here's the build: https://travis-ci.org/apache/flink/jobs/73851870 This looks like a travis hiccup though, similar to how tachyon fails sometimes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2457) Integrate Tuple0
[ https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659941#comment-14659941 ] ASF GitHub Bot commented on FLINK-2457: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/983#discussion_r36408488 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/Tuple0Serializer.java --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE + * file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the + * License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on + * an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the + * specific language governing permissions and limitations under the License. + */ +package org.apache.flink.api.java.typeutils.runtime; + +import java.io.IOException; + +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple0; +import org.apache.flink.core.memory.DataInputView; +import org.apache.flink.core.memory.DataOutputView; + +public class Tuple0Serializer extends TupleSerializerTuple0 { + private static final long serialVersionUID = 1278813169022975971L; + + public Tuple0Serializer() { + super(Tuple0.class, new TypeSerializer?[0]); + } + + @Override + public Tuple0Serializer duplicate() { + return new Tuple0Serializer(); --- End diff -- No need to create a new instance, since this is stateless. Integrate Tuple0 Key: FLINK-2457 URL: https://issues.apache.org/jira/browse/FLINK-2457 Project: Flink Issue Type: Improvement Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Tuple0 is not cleanly integrated: - missing serialization/deserialization support in runtime - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create an instance of Tuple0 Tuple0 is currently only used in Python API, but will be integrated into Storm compatibility, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/983#discussion_r36408610 --- Diff: flink-staging/flink-language-binding/flink-language-binding-generic/src/main/java/org/apache/flink/languagebinding/api/java/common/streaming/Receiver.java --- @@ -346,61 +349,12 @@ public Tuple deserialize() { } public static Tuple createTuple(int size) { - switch (size) { - case 0: - return new Tuple0(); - case 1: - return new Tuple1(); - case 2: - return new Tuple2(); - case 3: - return new Tuple3(); - case 4: - return new Tuple4(); - case 5: - return new Tuple5(); - case 6: - return new Tuple6(); - case 7: - return new Tuple7(); - case 8: - return new Tuple8(); - case 9: - return new Tuple9(); - case 10: - return new Tuple10(); - case 11: - return new Tuple11(); - case 12: - return new Tuple12(); - case 13: - return new Tuple13(); - case 14: - return new Tuple14(); - case 15: - return new Tuple15(); - case 16: - return new Tuple16(); - case 17: - return new Tuple17(); - case 18: - return new Tuple18(); - case 19: - return new Tuple19(); - case 20: - return new Tuple20(); - case 21: - return new Tuple21(); - case 22: - return new Tuple22(); - case 23: - return new Tuple23(); - case 24: - return new Tuple24(); - case 25: - return new Tuple25(); - default: - throw new IllegalArgumentException(Tuple size not supported: + size); + try { + return Tuple.getTupleClass(size).newInstance(); --- End diff -- This should have a size check and give a proper error message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2492) Rename remaining runtime classes from match to join
Stephan Ewen created FLINK-2492: --- Summary: Rename remaining runtime classes from match to join Key: FLINK-2492 URL: https://issues.apache.org/jira/browse/FLINK-2492 Project: Flink Issue Type: Improvement Components: Local Runtime Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Priority: Minor Fix For: 0.10 While working with the runtime join classes, I saw that many of them still refer to the join as match. Since all other parts now consistently refer to join, we should adjust the runtime classes as well. Makes it easier for new contributors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2466) Travis build failure
[ https://issues.apache.org/jira/browse/FLINK-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659893#comment-14659893 ] Sachin Goel commented on FLINK-2466: Most of my builds are still failing too. Travis build failure Key: FLINK-2466 URL: https://issues.apache.org/jira/browse/FLINK-2466 Project: Flink Issue Type: Bug Reporter: Sachin Goel Assignee: Stephan Ewen Fix For: 0.10 One of my builds failed with Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on project flink-streaming-core: ExecutionException: java.lang.RuntimeException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called? Here's the build: https://travis-ci.org/apache/flink/jobs/73851870 This looks like a travis hiccup though, similar to how tachyon fails sometimes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2457) Integrate Tuple0
[ https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659935#comment-14659935 ] ASF GitHub Bot commented on FLINK-2457: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/983#discussion_r36408352 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/tuple/builder/Tuple0Builder.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +// -- +// THIS IS A GENERATED SOURCE FILE. DO NOT EDIT! +// GENERATED FROM org.apache.flink.api.java.tuple.TupleGenerator. +// -- + + +package org.apache.flink.api.java.tuple.builder; + +import java.util.LinkedList; +import java.util.List; + +import org.apache.flink.api.java.tuple.Tuple0; + +public class Tuple0Builder { + + private ListTuple0 tuples = new LinkedListTuple0(); --- End diff -- ArrayLists are almost always way superior in performance than LinkedLists. Integrate Tuple0 Key: FLINK-2457 URL: https://issues.apache.org/jira/browse/FLINK-2457 Project: Flink Issue Type: Improvement Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Tuple0 is not cleanly integrated: - missing serialization/deserialization support in runtime - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create an instance of Tuple0 Tuple0 is currently only used in Python API, but will be integrated into Storm compatibility, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/983#discussion_r36408368 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/tuple/builder/Tuple0Builder.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +// -- +// THIS IS A GENERATED SOURCE FILE. DO NOT EDIT! +// GENERATED FROM org.apache.flink.api.java.tuple.TupleGenerator. +// -- + + +package org.apache.flink.api.java.tuple.builder; + +import java.util.LinkedList; +import java.util.List; + +import org.apache.flink.api.java.tuple.Tuple0; + +public class Tuple0Builder { + + private ListTuple0 tuples = new LinkedListTuple0(); --- End diff -- And memory efficiency... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2457) Integrate Tuple0
[ https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659936#comment-14659936 ] ASF GitHub Bot commented on FLINK-2457: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/983#discussion_r36408368 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/tuple/builder/Tuple0Builder.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +// -- +// THIS IS A GENERATED SOURCE FILE. DO NOT EDIT! +// GENERATED FROM org.apache.flink.api.java.tuple.TupleGenerator. +// -- + + +package org.apache.flink.api.java.tuple.builder; + +import java.util.LinkedList; +import java.util.List; + +import org.apache.flink.api.java.tuple.Tuple0; + +public class Tuple0Builder { + + private ListTuple0 tuples = new LinkedListTuple0(); --- End diff -- And memory efficiency... Integrate Tuple0 Key: FLINK-2457 URL: https://issues.apache.org/jira/browse/FLINK-2457 Project: Flink Issue Type: Improvement Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Tuple0 is not cleanly integrated: - missing serialization/deserialization support in runtime - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create an instance of Tuple0 Tuple0 is currently only used in Python API, but will be integrated into Storm compatibility, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/983#discussion_r36408488 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/Tuple0Serializer.java --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE + * file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the + * License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on + * an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the + * specific language governing permissions and limitations under the License. + */ +package org.apache.flink.api.java.typeutils.runtime; + +import java.io.IOException; + +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple0; +import org.apache.flink.core.memory.DataInputView; +import org.apache.flink.core.memory.DataOutputView; + +public class Tuple0Serializer extends TupleSerializerTuple0 { + private static final long serialVersionUID = 1278813169022975971L; + + public Tuple0Serializer() { + super(Tuple0.class, new TypeSerializer?[0]); + } + + @Override + public Tuple0Serializer duplicate() { + return new Tuple0Serializer(); --- End diff -- No need to create a new instance, since this is stateless. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/983#discussion_r36411464 --- Diff: flink-staging/flink-language-binding/flink-language-binding-generic/src/main/java/org/apache/flink/languagebinding/api/java/common/streaming/Receiver.java --- @@ -346,61 +349,12 @@ public Tuple deserialize() { } public static Tuple createTuple(int size) { - switch (size) { - case 0: - return new Tuple0(); - case 1: - return new Tuple1(); - case 2: - return new Tuple2(); - case 3: - return new Tuple3(); - case 4: - return new Tuple4(); - case 5: - return new Tuple5(); - case 6: - return new Tuple6(); - case 7: - return new Tuple7(); - case 8: - return new Tuple8(); - case 9: - return new Tuple9(); - case 10: - return new Tuple10(); - case 11: - return new Tuple11(); - case 12: - return new Tuple12(); - case 13: - return new Tuple13(); - case 14: - return new Tuple14(); - case 15: - return new Tuple15(); - case 16: - return new Tuple16(); - case 17: - return new Tuple17(); - case 18: - return new Tuple18(); - case 19: - return new Tuple19(); - case 20: - return new Tuple20(); - case 21: - return new Tuple21(); - case 22: - return new Tuple22(); - case 23: - return new Tuple23(); - case 24: - return new Tuple24(); - case 25: - return new Tuple25(); - default: - throw new IllegalArgumentException(Tuple size not supported: + size); + try { + return Tuple.getTupleClass(size).newInstance(); --- End diff -- Yep, you're right... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2491) Operators are not participating in state checkpointing in some cases
[ https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-2491: -- Description: While implementing a test case for the Kafka Consumer, I came across the following bug: Consider the following topology, with the operator parallelism in parentheses: Source (2) -- Sink (1). In this setup, the {{snapshotState()}} method is called on the source, but not on the Sink. The sink receives the generated data. The only one of the two sources is generating data. I've implemented a test case for this, you can find it here: https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java was: While implementing a test case for the Kafka Consumer, I came across the following bug: Consider the following topology, with the operator parallelism in parentheses: Source (2) -- Sink (1). In this setup, the {{snapshotState()}} method is called on the source, but not on the Sink. The sink receives the generated data. The only one of the two sources is generating data. I've implemented a test case for this, you can find it here: https://github.com/rmetzger/flink/tree/para_checkpoint_bug Operators are not participating in state checkpointing in some cases Key: FLINK-2491 URL: https://issues.apache.org/jira/browse/FLINK-2491 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Robert Metzger Priority: Critical While implementing a test case for the Kafka Consumer, I came across the following bug: Consider the following topology, with the operator parallelism in parentheses: Source (2) -- Sink (1). In this setup, the {{snapshotState()}} method is called on the source, but not on the Sink. The sink receives the generated data. The only one of the two sources is generating data. I've implemented a test case for this, you can find it here: https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2457) Integrate Tuple0
[ https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659947#comment-14659947 ] ASF GitHub Bot commented on FLINK-2457: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/983#issuecomment-128350696 Some comments inline, other than that, two issues with this pull request: - A lot of whitespace reformatting. We explicitly ask not to do this. Some IDEs do it automatically, but you can deactivate it. It makes diffs dangerously convoluted. - Tuple0 can be treated as a singleton, since it has no state. Any reason not to do this? Integrate Tuple0 Key: FLINK-2457 URL: https://issues.apache.org/jira/browse/FLINK-2457 Project: Flink Issue Type: Improvement Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Tuple0 is not cleanly integrated: - missing serialization/deserialization support in runtime - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create an instance of Tuple0 Tuple0 is currently only used in Python API, but will be integrated into Storm compatibility, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2307] Drop Java 6 support
GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/993 [FLINK-2307] Drop Java 6 support This pull request updates Travis to use only openjdk7, oraclejdk7, and oraclejdk8 for tests. Root pom and quickstarts poms bump the Java version to 1.7. You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink java7 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/993.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #993 commit 5a788ec23d50d36201ebb2fb0ad2b521272d034f Author: Stephan Ewen se...@apache.org Date: 2015-08-06T12:40:53Z [FLINK-2453] [pom] Move Java source and target version to 1.7 commit 249fa2bcdfc72bd6ce134ccdcb3921547af02752 Author: Stephan Ewen se...@apache.org Date: 2015-08-06T12:52:39Z [FLINK-2454] [buikd] Update Travis to drop JDK6 for tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2307) Drop Java 6 support
[ https://issues.apache.org/jira/browse/FLINK-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659963#comment-14659963 ] ASF GitHub Bot commented on FLINK-2307: --- GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/993 [FLINK-2307] Drop Java 6 support This pull request updates Travis to use only openjdk7, oraclejdk7, and oraclejdk8 for tests. Root pom and quickstarts poms bump the Java version to 1.7. You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink java7 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/993.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #993 commit 5a788ec23d50d36201ebb2fb0ad2b521272d034f Author: Stephan Ewen se...@apache.org Date: 2015-08-06T12:40:53Z [FLINK-2453] [pom] Move Java source and target version to 1.7 commit 249fa2bcdfc72bd6ce134ccdcb3921547af02752 Author: Stephan Ewen se...@apache.org Date: 2015-08-06T12:52:39Z [FLINK-2454] [buikd] Update Travis to drop JDK6 for tests Drop Java 6 support --- Key: FLINK-2307 URL: https://issues.apache.org/jira/browse/FLINK-2307 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 0.10 Reporter: Robert Metzger As per: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Failing-Builds-on-Travis-td5360.html We need to change the java version in the poms and adopt the Travis build profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (FLINK-2464) BufferSpillerTest sometimes fails
[ https://issues.apache.org/jira/browse/FLINK-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Márton Balassi reopened FLINK-2464: --- Thanks for the fix, [~StephanEwen]. Unfortunately I still see tests of the `flink-streaming-core` module fail due to a killed JVM post your commit. [1] I can also suspect that this is coming from either the `BarrierBufferTest` or the `BufferSpillerTest`, becuase after disabling them `flink-streaming-core` did not once fail on java 6 in 20 builds triggered on travis. [2] (Builds that still failed were due to the failures of the `StreamCheckpointingITCase` or the `PartitionedStateCheckpointingITCase` in `flink-tests`.) [1] https://travis-ci.org/mbalassi/flink/jobs/74391653 [2] See build #1 through #10 here https://travis-ci.org/mbalassi-travis-0/flink/builds BufferSpillerTest sometimes fails - Key: FLINK-2464 URL: https://issues.apache.org/jira/browse/FLINK-2464 Project: Flink Issue Type: Bug Components: Streaming Reporter: Gyula Fora Assignee: Stephan Ewen Priority: Minor Fix For: 0.10 The BufferSpillerTest failed with the following error: org.apache.flink.streaming.runtime.io.BufferSpillerTest testSpillWhileReading(org.apache.flink.streaming.runtime.io.BufferSpillerTest) Time elapsed: 3.28 sec FAILURE! java.lang.AssertionError: wrong buffer contents expected:0 but was:58 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.flink.streaming.runtime.io.BufferSpillerTest.validateBuffer(BufferSpillerTest.java:290) at org.apache.flink.streaming.runtime.io.BufferSpillerTest.access$200(BufferSpillerTest.java:42) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/983#discussion_r36408352 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/tuple/builder/Tuple0Builder.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +// -- +// THIS IS A GENERATED SOURCE FILE. DO NOT EDIT! +// GENERATED FROM org.apache.flink.api.java.tuple.TupleGenerator. +// -- + + +package org.apache.flink.api.java.tuple.builder; + +import java.util.LinkedList; +import java.util.List; + +import org.apache.flink.api.java.tuple.Tuple0; + +public class Tuple0Builder { + + private ListTuple0 tuples = new LinkedListTuple0(); --- End diff -- ArrayLists are almost always way superior in performance than LinkedLists. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2457) Integrate Tuple0
[ https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659942#comment-14659942 ] ASF GitHub Bot commented on FLINK-2457: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/983#discussion_r36408610 --- Diff: flink-staging/flink-language-binding/flink-language-binding-generic/src/main/java/org/apache/flink/languagebinding/api/java/common/streaming/Receiver.java --- @@ -346,61 +349,12 @@ public Tuple deserialize() { } public static Tuple createTuple(int size) { - switch (size) { - case 0: - return new Tuple0(); - case 1: - return new Tuple1(); - case 2: - return new Tuple2(); - case 3: - return new Tuple3(); - case 4: - return new Tuple4(); - case 5: - return new Tuple5(); - case 6: - return new Tuple6(); - case 7: - return new Tuple7(); - case 8: - return new Tuple8(); - case 9: - return new Tuple9(); - case 10: - return new Tuple10(); - case 11: - return new Tuple11(); - case 12: - return new Tuple12(); - case 13: - return new Tuple13(); - case 14: - return new Tuple14(); - case 15: - return new Tuple15(); - case 16: - return new Tuple16(); - case 17: - return new Tuple17(); - case 18: - return new Tuple18(); - case 19: - return new Tuple19(); - case 20: - return new Tuple20(); - case 21: - return new Tuple21(); - case 22: - return new Tuple22(); - case 23: - return new Tuple23(); - case 24: - return new Tuple24(); - case 25: - return new Tuple25(); - default: - throw new IllegalArgumentException(Tuple size not supported: + size); + try { + return Tuple.getTupleClass(size).newInstance(); --- End diff -- This should have a size check and give a proper error message. Integrate Tuple0 Key: FLINK-2457 URL: https://issues.apache.org/jira/browse/FLINK-2457 Project: Flink Issue Type: Improvement Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Tuple0 is not cleanly integrated: - missing serialization/deserialization support in runtime - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create an instance of Tuple0 Tuple0 is currently only used in Python API, but will be integrated into Storm compatibility, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/983#issuecomment-128350696 Some comments inline, other than that, two issues with this pull request: - A lot of whitespace reformatting. We explicitly ask not to do this. Some IDEs do it automatically, but you can deactivate it. It makes diffs dangerously convoluted. - Tuple0 can be treated as a singleton, since it has no state. Any reason not to do this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2307) Drop Java 6 support
[ https://issues.apache.org/jira/browse/FLINK-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659966#comment-14659966 ] ASF GitHub Bot commented on FLINK-2307: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/993#issuecomment-128354239 +1 (given the tests pass) Drop Java 6 support --- Key: FLINK-2307 URL: https://issues.apache.org/jira/browse/FLINK-2307 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 0.10 Reporter: Robert Metzger As per: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Failing-Builds-on-Travis-td5360.html We need to change the java version in the poms and adopt the Travis build profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2457) Integrate Tuple0
[ https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659977#comment-14659977 ] ASF GitHub Bot commented on FLINK-2457: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/983#discussion_r36411464 --- Diff: flink-staging/flink-language-binding/flink-language-binding-generic/src/main/java/org/apache/flink/languagebinding/api/java/common/streaming/Receiver.java --- @@ -346,61 +349,12 @@ public Tuple deserialize() { } public static Tuple createTuple(int size) { - switch (size) { - case 0: - return new Tuple0(); - case 1: - return new Tuple1(); - case 2: - return new Tuple2(); - case 3: - return new Tuple3(); - case 4: - return new Tuple4(); - case 5: - return new Tuple5(); - case 6: - return new Tuple6(); - case 7: - return new Tuple7(); - case 8: - return new Tuple8(); - case 9: - return new Tuple9(); - case 10: - return new Tuple10(); - case 11: - return new Tuple11(); - case 12: - return new Tuple12(); - case 13: - return new Tuple13(); - case 14: - return new Tuple14(); - case 15: - return new Tuple15(); - case 16: - return new Tuple16(); - case 17: - return new Tuple17(); - case 18: - return new Tuple18(); - case 19: - return new Tuple19(); - case 20: - return new Tuple20(); - case 21: - return new Tuple21(); - case 22: - return new Tuple22(); - case 23: - return new Tuple23(); - case 24: - return new Tuple24(); - case 25: - return new Tuple25(); - default: - throw new IllegalArgumentException(Tuple size not supported: + size); + try { + return Tuple.getTupleClass(size).newInstance(); --- End diff -- Yep, you're right... Integrate Tuple0 Key: FLINK-2457 URL: https://issues.apache.org/jira/browse/FLINK-2457 Project: Flink Issue Type: Improvement Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Tuple0 is not cleanly integrated: - missing serialization/deserialization support in runtime - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create an instance of Tuple0 Tuple0 is currently only used in Python API, but will be integrated into Storm compatibility, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-128441993 This was a super cool contribution. A pretty sophisticated addition, super testing, high code quality. I am very impressed! I hope you will contribute more to Flink. Already saw that you opened another pull request, for a sampling operator. Happy that this is happening :-) In the future, I can hopefully review and merge the pull requests faster. The past weeks, I did not get to code work as much as I wanted, and the list of critical issues was long, so this pull request got delayed a bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
[ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660355#comment-14660355 ] ASF GitHub Bot commented on FLINK-2240: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-128445757 Hi, this looks great indeed! Just out of curiosity, why did you write your own bloom filter implementation and not use a ready one, e.g. from guava? I'm wondering because in #923 we also want to use a bloom filter for an approximate algorithm implementation. Thanks! Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join Key: FLINK-2240 URL: https://issues.apache.org/jira/browse/FLINK-2240 Project: Flink Issue Type: Improvement Components: Core Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Minor Fix For: 0.10 In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data would be spilled to disk, and the counterpart partition of big table data would be spilled to disk in probe phase as well. If we build a BloomFilter while spill small table to disk during build phase, and use it to filter the big table records which tend to be spilled to disk, this may greatly reduce the spilled big table file size, and saved the disk IO cost for writing and further reading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2444) Add tests for HadoopInputFormats
[ https://issues.apache.org/jira/browse/FLINK-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660371#comment-14660371 ] James Cao commented on FLINK-2444: -- Hi, for a sufficient test, what's the expected strategy? The hive and hadoop community use hadoop minicluster to do end-to-end unit test. I tried run a flink word count task against the minicluster inside the ide, it takes about ~5s (including provisioning of the mini cluster, and tear down the cluster afterwards.) Is this an acceptable running time? I guess if we use minicluster, we can get relative sufficient test for the HadoopInputFormats's wrapped format for both mapred and mapreduce style api, and it's probably not very easy to set up a mock test that simulate the hadoop fs environment. The problem with minicluster is that it's only available in hadoop2. So it's not available in hadoop1 profile. I think the issue I am working on [FLINK-1919] Hcatoutputformat also has a similar problem. Do we want to run the test against a mini-hive server in that case? Add tests for HadoopInputFormats Key: FLINK-2444 URL: https://issues.apache.org/jira/browse/FLINK-2444 Project: Flink Issue Type: Test Components: Hadoop Compatibility, Tests Affects Versions: 0.10, 0.9.0 Reporter: Fabian Hueske Labels: starter The HadoopInputFormats and HadoopInputFormatBase classes are not sufficiently covered by unit tests. We need tests that ensure that the methods of the wrapped Hadoop InputFormats are correctly called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-981) Support for generated Cloudera Hadoop configuration
[ https://issues.apache.org/jira/browse/FLINK-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660492#comment-14660492 ] Robert Metzger commented on FLINK-981: -- I will close this issue for now. No user every complained about it, I've used Flink on a cloudera system 2-3 month ago. Support for generated Cloudera Hadoop configuration Key: FLINK-981 URL: https://issues.apache.org/jira/browse/FLINK-981 Project: Flink Issue Type: Bug Components: Local Runtime, YARN Client Reporter: Robert Metzger Cloudera Hadoop generates configuration files that different from the vanilla upstream Hadoop configuration files. The HDFS and the YARN component both access configuration values from Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2493) Simplify names of example program JARs
Stephan Ewen created FLINK-2493: --- Summary: Simplify names of example program JARs Key: FLINK-2493 URL: https://issues.apache.org/jira/browse/FLINK-2493 Project: Flink Issue Type: Improvement Components: Examples Affects Versions: 0.10 Reporter: Stephan Ewen Priority: Minor I find the names of the example JARs a bit annoying. Why not name the file {{examples/ConnectedComponents.jar}} rather than {{examples/flink-java-examples-0.10-SNAPSHOT-ConnectedComponents.jar}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-128440090 Manually merged in 61dcae391cb3b45ba3aff47d4d9163889d2958a4 I added a commit on top to pass the flag to enable/disable bloom filters through the runtime configuration. That is the basis for later allowing it to enable it on a per-job basis. Also, we want to get rid of the `GlobalConfiguration` singleton pattern. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
[ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660322#comment-14660322 ] ASF GitHub Bot commented on FLINK-2240: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-128440090 Manually merged in 61dcae391cb3b45ba3aff47d4d9163889d2958a4 I added a commit on top to pass the flag to enable/disable bloom filters through the runtime configuration. That is the basis for later allowing it to enable it on a per-job basis. Also, we want to get rid of the `GlobalConfiguration` singleton pattern. Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join Key: FLINK-2240 URL: https://issues.apache.org/jira/browse/FLINK-2240 Project: Flink Issue Type: Improvement Components: Core Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Minor In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data would be spilled to disk, and the counterpart partition of big table data would be spilled to disk in probe phase as well. If we build a BloomFilter while spill small table to disk during build phase, and use it to filter the big table records which tend to be spilled to disk, this may greatly reduce the spilled big table file size, and saved the disk IO cost for writing and further reading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-100) Pact API Proposal: Add keyless CoGroup (send all to a single group)
[ https://issues.apache.org/jira/browse/FLINK-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-100. -- Pact API Proposal: Add keyless CoGroup (send all to a single group) --- Key: FLINK-100 URL: https://issues.apache.org/jira/browse/FLINK-100 Project: Flink Issue Type: Improvement Reporter: GitHub Import Priority: Minor Labels: github-import I propose to add a keyless version of CoGroup that groups both inputs in a single group, analogous to the keyless Reducer version that was added in https://github.com/dimalabs/ozone/pull/61 ``` CoGroupContract myCoGroup = CoGroupContract.builder(MyUdf.class) .input1(contractA) .input2(contractB) .build(); ``` I have a use case where I need to process the output of two contracts in a single udf and I currently have to use the workaround to add a constant field and use this as grouping key. Adding a keyless version would reduce the overhead (network traffic, serialization and code-writing) and give the compiler additional knowledge (The compiler knows that there will be only a single group and a single udf call. If setAvgRecordsEmittedPerStubCall is set, it could infer the output cardinality) Furthermore I think this would be consequent, because CoGroup is like Reduce for multiple inputs. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/100 Created by: [andrehacker|https://github.com/andrehacker] Labels: enhancement, Created at: Sat Sep 14 23:15:59 CEST 2013 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-202) Workset Iterations: No Match Found Behaviour of Solution Set Join
[ https://issues.apache.org/jira/browse/FLINK-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-202. Resolution: Fixed Fix Version/s: (was: pre-apache) 0.8.0 Solution set returns null on lookups of missing entries. Workset Iterations: No Match Found Behaviour of Solution Set Join --- Key: FLINK-202 URL: https://issues.apache.org/jira/browse/FLINK-202 Project: Flink Issue Type: Improvement Reporter: GitHub Import Labels: github-import Fix For: 0.8.0 If I do a solution set match and there is no corresponding entry in the solution set index a RuntimeException is thrown and the job fails. Therefore the initial solution set must already contain every element which will be in the final solution set. I'm not sure if this is a real limitation, but I find it inconvenient. When I was playing around with the workset connected components I couldn't just use parts of my test data, because it resulted in a solution set join where records couldn't be matched. I just wanted to think out loudly here. Did anybody else find this inconvenient? What alternative should we provide? Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/202 Created by: [uce|https://github.com/uce] Labels: core, enhancement, Created at: Wed Oct 23 23:24:29 CEST 2013 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-655) Add support for both single and set of broadcast values
[ https://issues.apache.org/jira/browse/FLINK-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660395#comment-14660395 ] Stephan Ewen commented on FLINK-655: Will this still be done, or should we close it as won't fix? Add support for both single and set of broadcast values --- Key: FLINK-655 URL: https://issues.apache.org/jira/browse/FLINK-655 Project: Flink Issue Type: Improvement Components: Java API Reporter: Ufuk Celebi Assignee: Henry Saputra Labels: breaking-api, github-import, starter Fix For: pre-apache To broadcast a data set you have to do the following: ```java lorem.map(new MyMapper()).withBroadcastSet(toBroadcast, toBroadcastName) ``` In the operator you call: ```java getRuntimeContext().getBroadcastVariable(toBroadcastName) ``` I propose to have both method names consistent, e.g. - `withBroadcastVariable(DataSet, String)`, or - `getBroadcastSet(String)`. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/655 Created by: [uce|https://github.com/uce] Labels: enhancement, java api, user satisfaction, Created at: Wed Apr 02 16:29:08 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-915) Introduce two in one progress bars
[ https://issues.apache.org/jira/browse/FLINK-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-915. Resolution: Invalid Fix Version/s: (was: pre-apache) Outdated and subsumed by new Task State Model and new web frontend. Introduce two in one progress bars -- Key: FLINK-915 URL: https://issues.apache.org/jira/browse/FLINK-915 Project: Flink Issue Type: Improvement Reporter: GitHub Import Priority: Trivial Labels: github-import Attachments: pull-request-915-1281267458081589740.patch The two in one progress bars are approximations which are calculated out of the job event information. Additionally: FINISHING tasks are still shown as running tasks. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/pull/915 Created by: [tobwiens|https://github.com/tobwiens] Labels: enhancement, Created at: Fri Jun 06 17:02:53 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-915) Introduce two in one progress bars
[ https://issues.apache.org/jira/browse/FLINK-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-915. -- Introduce two in one progress bars -- Key: FLINK-915 URL: https://issues.apache.org/jira/browse/FLINK-915 Project: Flink Issue Type: Improvement Reporter: GitHub Import Priority: Trivial Labels: github-import Attachments: pull-request-915-1281267458081589740.patch The two in one progress bars are approximations which are calculated out of the job event information. Additionally: FINISHING tasks are still shown as running tasks. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/pull/915 Created by: [tobwiens|https://github.com/tobwiens] Labels: enhancement, Created at: Fri Jun 06 17:02:53 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1082) WebClient improvements
[ https://issues.apache.org/jira/browse/FLINK-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-1082. - Resolution: Done Remaining issues will not be fixed. Remainder is subsumed by new web frontend [FLINK-2357] WebClient improvements -- Key: FLINK-1082 URL: https://issues.apache.org/jira/browse/FLINK-1082 Project: Flink Issue Type: Improvement Reporter: Jonathan Hasenburg New Issue to summarize all things that should be done regarding the webclient. * DONE: Setting the ship strategy of a broadcast variable from broadcast to broadcast variable * DONE: Reduce size of nodes by removing some information * DONE: Some nodes can be at the same time next partial solution and termination criterion, or next workset and solution set delta. It currently seems to highlight one one of them. * DONE: Recreate the generation of the JSON strings which are used to create the graph. Change to an object based layout to get away from the string based layout. * NOT POSSIBLE, because this creates a circle and it can happen that the first node is at the right side of the graph - confusing: Feedback arrow from the next partial solution to the partial solution * show full graph at very beginning * fix bug in chrome where the node boxes are to small * DONE: graph should be redrawn when the window gets resized -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1083) WebInterface improvements
[ https://issues.apache.org/jira/browse/FLINK-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-1083. - Resolution: Done Remainder is subsumed by new web frontend [FLINK-2357] WebInterface improvements - Key: FLINK-1083 URL: https://issues.apache.org/jira/browse/FLINK-1083 Project: Flink Issue Type: Improvement Components: JobManager Reporter: Jonathan Hasenburg New Issue to summarize all things that should be done regarding the webinterface. * rework dashboard in a way that more than one job can be shown ... . If a job is clicked you get to the details. * DONE: add history to dashboard * Running jobs should get a view like jobs in the history. * DONE: rework the menu and try to add some links to the dashboard (like the taskmanager section) * improve the way the jsons are send to the webinterface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1082) WebClient improvements
[ https://issues.apache.org/jira/browse/FLINK-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-1082. --- WebClient improvements -- Key: FLINK-1082 URL: https://issues.apache.org/jira/browse/FLINK-1082 Project: Flink Issue Type: Improvement Reporter: Jonathan Hasenburg New Issue to summarize all things that should be done regarding the webclient. * DONE: Setting the ship strategy of a broadcast variable from broadcast to broadcast variable * DONE: Reduce size of nodes by removing some information * DONE: Some nodes can be at the same time next partial solution and termination criterion, or next workset and solution set delta. It currently seems to highlight one one of them. * DONE: Recreate the generation of the JSON strings which are used to create the graph. Change to an object based layout to get away from the string based layout. * NOT POSSIBLE, because this creates a circle and it can happen that the first node is at the right side of the graph - confusing: Feedback arrow from the next partial solution to the partial solution * show full graph at very beginning * fix bug in chrome where the node boxes are to small * DONE: graph should be redrawn when the window gets resized -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-128442757 Oh, I forgot to add the closing message to the commit, so the ASF bot did not close the pull request. Can you close the pull request manually (only you as the owner can do that). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-7) [GitHub] Enable Range Partitioner
[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-7: - Issue Type: Sub-task (was: New Feature) Parent: FLINK-598 [GitHub] Enable Range Partitioner - Key: FLINK-7 URL: https://issues.apache.org/jira/browse/FLINK-7 Project: Flink Issue Type: Sub-task Reporter: GitHub Import Labels: github-import Fix For: pre-apache The range partitioner is currently disabled. We need to implement the following aspects: 1) Distribution information, if available, must be propagated back together with the ordering property. 2) A generic bucket lookup structure (currently specific to PactRecord). Tests to re-enable after fixing this issue: - TeraSortITCase - GlobalSortingITCase - GlobalSortingMixedOrderITCase Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/7 Created by: [StephanEwen|https://github.com/StephanEwen] Labels: core, enhancement, optimizer, Milestone: Release 0.4 Assignee: [fhueske|https://github.com/fhueske] Created at: Fri Apr 26 13:48:24 CEST 2013 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-128445757 Hi, this looks great indeed! Just out of curiosity, why did you write your own bloom filter implementation and not use a ready one, e.g. from guava? I'm wondering because in #923 we also want to use a bloom filter for an approximate algorithm implementation. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address
[ https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-938. -- Change start-cluster.sh script so that users don't have to configure the JobManager address --- Key: FLINK-938 URL: https://issues.apache.org/jira/browse/FLINK-938 Project: Flink Issue Type: Improvement Components: Build System Reporter: Robert Metzger Assignee: Mingliang Qi Priority: Minor Fix For: 0.9 To improve the user experience, Flink should not require users to configure the JobManager's address on a cluster. In combination with FLINK-934, this would allow running Flink with decent performance on a cluster without setting a single configuration value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2492) Rename remaining runtime classes from match to join
[ https://issues.apache.org/jira/browse/FLINK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660315#comment-14660315 ] ASF GitHub Bot commented on FLINK-2492: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/995#issuecomment-128439329 Manually merged in 685086a3dd9afcec2eec76485298bc7b3f031a3c Rename remaining runtime classes from match to join --- Key: FLINK-2492 URL: https://issues.apache.org/jira/browse/FLINK-2492 Project: Flink Issue Type: Improvement Components: Local Runtime Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Priority: Minor Fix For: 0.10 While working with the runtime join classes, I saw that many of them still refer to the join as match. Since all other parts now consistently refer to join, we should adjust the runtime classes as well. Makes it easier for new contributors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2492] [runtime] Rename former 'match' c...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/995#issuecomment-128439329 Manually merged in 685086a3dd9afcec2eec76485298bc7b3f031a3c --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-128446267 The bloom filters are stored in subregions of Flink's memory segments, not in any additional memory. That is very nice (occupies no extra memory), but requires to go against Flink's memory segments, rather than longs or byte arrays. Hence, the custom implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-266) Warn user if cluster did not come up as expected
[ https://issues.apache.org/jira/browse/FLINK-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-266. Resolution: Won't Fix Fix Version/s: (was: pre-apache) The deployment model is different now, the pre-flight phase and optimizer do not connect to the cluster any more to gather the availability of resources. Therefor, this issue is not really an issue any more ;-) Warn user if cluster did not come up as expected Key: FLINK-266 URL: https://issues.apache.org/jira/browse/FLINK-266 Project: Flink Issue Type: Improvement Reporter: GitHub Import Labels: github-import While I did some work on a cluster, I was wondering why my job did not utilize all TaskManagers. It seems that I started my job too early (before all TaskManager registered with the JobManager) and therefore, the compiler did not consider them. We should either make the `start-cluster.sh` script blocking (with a timeout). Or the pact-client.sh should report a warning if less TaskManagers than expected (number in `slaves` file) are up. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/266 Created by: [rmetzger|https://github.com/rmetzger] Labels: enhancement, runtime, simple-issue, Created at: Mon Nov 11 15:56:56 CET 2013 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-202) Workset Iterations: No Match Found Behaviour of Solution Set Join
[ https://issues.apache.org/jira/browse/FLINK-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-202. -- Workset Iterations: No Match Found Behaviour of Solution Set Join --- Key: FLINK-202 URL: https://issues.apache.org/jira/browse/FLINK-202 Project: Flink Issue Type: Improvement Reporter: GitHub Import Labels: github-import Fix For: 0.8.0 If I do a solution set match and there is no corresponding entry in the solution set index a RuntimeException is thrown and the job fails. Therefore the initial solution set must already contain every element which will be in the final solution set. I'm not sure if this is a real limitation, but I find it inconvenient. When I was playing around with the workset connected components I couldn't just use parts of my test data, because it resulted in a solution set join where records couldn't be matched. I just wanted to think out loudly here. Did anybody else find this inconvenient? What alternative should we provide? Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/202 Created by: [uce|https://github.com/uce] Labels: core, enhancement, Created at: Wed Oct 23 23:24:29 CEST 2013 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-266) Warn user if cluster did not come up as expected
[ https://issues.apache.org/jira/browse/FLINK-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-266. -- Warn user if cluster did not come up as expected Key: FLINK-266 URL: https://issues.apache.org/jira/browse/FLINK-266 Project: Flink Issue Type: Improvement Reporter: GitHub Import Labels: github-import While I did some work on a cluster, I was wondering why my job did not utilize all TaskManagers. It seems that I started my job too early (before all TaskManager registered with the JobManager) and therefore, the compiler did not consider them. We should either make the `start-cluster.sh` script blocking (with a timeout). Or the pact-client.sh should report a warning if less TaskManagers than expected (number in `slaves` file) are up. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/266 Created by: [rmetzger|https://github.com/rmetzger] Labels: enhancement, runtime, simple-issue, Created at: Mon Nov 11 15:56:56 CET 2013 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
[ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660334#comment-14660334 ] ASF GitHub Bot commented on FLINK-2240: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-128442757 Oh, I forgot to add the closing message to the commit, so the ASF bot did not close the pull request. Can you close the pull request manually (only you as the owner can do that). Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join Key: FLINK-2240 URL: https://issues.apache.org/jira/browse/FLINK-2240 Project: Flink Issue Type: Improvement Components: Core Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Minor In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data would be spilled to disk, and the counterpart partition of big table data would be spilled to disk in probe phase as well. If we build a BloomFilter while spill small table to disk during build phase, and use it to filter the big table records which tend to be spilled to disk, this may greatly reduce the spilled big table file size, and saved the disk IO cost for writing and further reading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2492) Rename remaining runtime classes from match to join
[ https://issues.apache.org/jira/browse/FLINK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2492. - Resolution: Fixed Fixed via 685086a3dd9afcec2eec76485298bc7b3f031a3c Rename remaining runtime classes from match to join --- Key: FLINK-2492 URL: https://issues.apache.org/jira/browse/FLINK-2492 Project: Flink Issue Type: Improvement Components: Local Runtime Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Priority: Minor Fix For: 0.10 While working with the runtime join classes, I saw that many of them still refer to the join as match. Since all other parts now consistently refer to join, we should adjust the runtime classes as well. Makes it easier for new contributors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2492) Rename remaining runtime classes from match to join
[ https://issues.apache.org/jira/browse/FLINK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2492. --- Rename remaining runtime classes from match to join --- Key: FLINK-2492 URL: https://issues.apache.org/jira/browse/FLINK-2492 Project: Flink Issue Type: Improvement Components: Local Runtime Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Priority: Minor Fix For: 0.10 While working with the runtime join classes, I saw that many of them still refer to the join as match. Since all other parts now consistently refer to join, we should adjust the runtime classes as well. Makes it easier for new contributors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
[ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660359#comment-14660359 ] ASF GitHub Bot commented on FLINK-2240: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-128446267 The bloom filters are stored in subregions of Flink's memory segments, not in any additional memory. That is very nice (occupies no extra memory), but requires to go against Flink's memory segments, rather than longs or byte arrays. Hence, the custom implementation. Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join Key: FLINK-2240 URL: https://issues.apache.org/jira/browse/FLINK-2240 Project: Flink Issue Type: Improvement Components: Core Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Minor Fix For: 0.10 In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data would be spilled to disk, and the counterpart partition of big table data would be spilled to disk in probe phase as well. If we build a BloomFilter while spill small table to disk during build phase, and use it to filter the big table records which tend to be spilled to disk, this may greatly reduce the spilled big table file size, and saved the disk IO cost for writing and further reading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [WIP][FLINK-2386] Add new Kafka Consumers
GitHub user rmetzger opened a pull request: https://github.com/apache/flink/pull/996 [WIP][FLINK-2386] Add new Kafka Consumers I'm opening a WIP pull request (against our rules) to get some feedback on my ongoing work. Please note that I'm on vacation next week (until August 17) **Why this rework?** The current `PersistentKafkaSource` does not always provide exactly-once processing guarantees because we are using the high level Consumer API of Kafka. We've chosen to use that API because it is handling all the corner cases such as leader election, leader failover and other low level stuff. The problem is that the API does not allow us to - commit offsets manually - consistently (across restarts) assign partitions to Flink instances The Kafka community is aware of these issues and actively working on a new Consumer API. See https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design and https://issues.apache.org/jira/browse/KAFKA-1326 The release of Kafka 0.8.3 is scheduled for October 2015 (see plan: https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan) Therefore, I decided for the following approach: Copy the code of the unreleased, new Kafka Consumer into the Flink consumer and use it. The new API has all the bells and whistles we need (manual committing, per-partition subscriptions, nice APIs), but it is not completely backwards compatible. We can retrieve topic metadata with the new API from Kafka 0.8.1, 0.8.2 (and of course 0.8.3) We can retrieve data from Kafka 0.8.2 (and 0.8.3) We can only commit to Kafka 0.8.3 Therefore, this pull request contains three different user facing classes `FlinkKafkaConsumer081`, `FlinkKafkaConsumer082` and `FlinkKafkaConsumer083` for the different possible combinations. For 0.8.1 we are using a hand-crafted implementation against the simple consumer API (https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example) so we had to do what we originally wanted to avoid. I tried to make that implementation as robust and efficient as possible. I'm intentionally not handling any broker failures in the code. For these cases, I'm relying on Flink's fault tolerance mechanisms (which effectively means redeploying the Kafka sources against other online brokers) For reviewing the pull request, there are only a few important classes to look at: - FlinkKafkaConsumerBase - IncludedFetcher - LegacyFetcher (the one implementing the SimpleConsumer API) I fixed a little bug in the stream graph generator. It was ignoring the number of execution retries when no checkpointing is enabled. Known issues: - this pull request contains at least one failing test - the KafkaConsumer contains at least one known, yet untested bug - missing documentation I will also open a pull request for using the new Producer API. It provides much better performance and usability. Open questions: - Do we really want to copy 20k+ lines of code into our code base (for now)? If there are concerns about this, I could also manually implement the missing pieces. Its probably 100 lines of code for getting the partition infos for a topic, and we would use the Simple Consumer also for reading from 0.8.2. - Do we want to use the packaging I'm suggesting here (additional maven module for `flink-connector-kafka-083`). We would need to introduce it anyways when Kafka releases 0.8.3 because the dependencies are not compatible. But its adding confusion for our users. I will write more documentation for guidance. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rmetzger/flink flink2386 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/996.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #996 commit 177e0bbc6bc613b67111ba038e0ded4fae8474f1 Author: Robert Metzger rmetz...@apache.org Date: 2015-07-20T19:39:46Z wip commit 70cb8f1ecb7df7a98313796609d2fa0dbade86bf Author: Robert Metzger rmetz...@apache.org Date: 2015-07-21T15:21:45Z [FLINK-2386] Add initial code for the new kafka connector, with everything unreleased copied from the kafka sources commit a4a2847908a8c2f118b8667d7cb66693c065c38d Author: Robert Metzger rmetz...@apache.org Date: 2015-07-21T17:58:13Z wip commit b02cde37c2120ff6f0fcf1c233391a1d8804e594 Author: Robert Metzger rmetz...@apache.org Date: 2015-07-22T15:29:58Z wip commit 54a05c39d150b016e0a089daedb3492d986b93bd Author: Robert Metzger rmetz...@apache.org Date: 2015-07-22T19:56:41Z wip commit 393fd6766a5df4bf14ef0c13864b8a4abdb62bb4 Author: Robert Metzger
[jira] [Resolved] (FLINK-854) Web interface: show runtime of subtasks
[ https://issues.apache.org/jira/browse/FLINK-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-854. Resolution: Won't Fix Fix Version/s: (was: pre-apache) Subsumed by effort for the new web interface [FLINK-2357] Web interface: show runtime of subtasks --- Key: FLINK-854 URL: https://issues.apache.org/jira/browse/FLINK-854 Project: Flink Issue Type: Improvement Components: Webfrontend Reporter: Ufuk Celebi Priority: Minor Labels: github-import When I click on the detailed view of a task, I see all subtasks as in the screenshot below. I would also like to show the runtime per stage, e.g. I want to know how long the yellow subtask was in running. ![screen shot 2014-05-24 at 18 09 36|https://cloud.githubusercontent.com/assets/1756620/3074999/d0ebe1d6-e35d-11e3-9f18-26ac5ce96999.png] Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/854 Created by: [uce|https://github.com/uce] Labels: enhancement, gui, user satisfaction, Milestone: Release 0.6 (unplanned) Created at: Sat May 24 18:10:25 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-854) Web interface: show runtime of subtasks
[ https://issues.apache.org/jira/browse/FLINK-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-854. -- Web interface: show runtime of subtasks --- Key: FLINK-854 URL: https://issues.apache.org/jira/browse/FLINK-854 Project: Flink Issue Type: Improvement Components: Webfrontend Reporter: Ufuk Celebi Priority: Minor Labels: github-import When I click on the detailed view of a task, I see all subtasks as in the screenshot below. I would also like to show the runtime per stage, e.g. I want to know how long the yellow subtask was in running. ![screen shot 2014-05-24 at 18 09 36|https://cloud.githubusercontent.com/assets/1756620/3074999/d0ebe1d6-e35d-11e3-9f18-26ac5ce96999.png] Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/854 Created by: [uce|https://github.com/uce] Labels: enhancement, gui, user satisfaction, Milestone: Release 0.6 (unplanned) Created at: Sat May 24 18:10:25 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-871) Create a documentation distribution together with other release artifacts
[ https://issues.apache.org/jira/browse/FLINK-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-871. Resolution: Fixed Fix Version/s: (was: pre-apache) Documentation is part of the source code, and the Apache CI infrastructure builds and updates snapshot documentation on a nightly basis. Create a documentation distribution together with other release artifacts - Key: FLINK-871 URL: https://issues.apache.org/jira/browse/FLINK-871 Project: Flink Issue Type: Improvement Reporter: GitHub Import Priority: Minor Labels: github-import It would be good to have a documentation distribution together with the other release artifacts. We can use markdown files, .md, for documentation and use maven md plugin for managements. The same documentation can be used for the web site etc.. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/871 Created by: [danrk|https://github.com/danrk] Labels: documentation, enhancement, Created at: Tue May 27 02:44:53 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1013) ArithmeticException: / by zero in MutableHashTable
[ https://issues.apache.org/jira/browse/FLINK-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-1013. - Resolution: Fixed Assignee: Stephan Ewen Fix Version/s: 0.9 Fixed already a while back by delegating computation to existing utility function that is also used to build the initial table. ArithmeticException: / by zero in MutableHashTable -- Key: FLINK-1013 URL: https://issues.apache.org/jira/browse/FLINK-1013 Project: Flink Issue Type: Bug Reporter: Till Rohrmann Assignee: Stephan Ewen Fix For: 0.9 I encountered a division by zero exception in the MutableHashTable. It happened when I joined two datasets of relatively big records (approx. 40-50 MB I think). When joining them the buildTableFromSpilledPartition method of the MutableHashTable is called. In case that the available buffers are smaller than the needed number of buffers, the mutable hash table will calculate the bucket count {code} bucketCount = (int) (((long) totalBuffersAvailable) * RECORD_TABLE_BYTES / (avgRecordLenPartition + RECORD_OVERHEAD_BYTES)); {code} If the average record length is sufficiently large, then the bucket count will be 0. Initializing the hash table with a 0 bucket count will cause then the division by 0 exception. I don't know whether this problem can be mitigated but it should at least throw a meaningful exception instead of the ArithmeticException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1013) ArithmeticException: / by zero in MutableHashTable
[ https://issues.apache.org/jira/browse/FLINK-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-1013. --- ArithmeticException: / by zero in MutableHashTable -- Key: FLINK-1013 URL: https://issues.apache.org/jira/browse/FLINK-1013 Project: Flink Issue Type: Bug Reporter: Till Rohrmann Assignee: Stephan Ewen Fix For: 0.9 I encountered a division by zero exception in the MutableHashTable. It happened when I joined two datasets of relatively big records (approx. 40-50 MB I think). When joining them the buildTableFromSpilledPartition method of the MutableHashTable is called. In case that the available buffers are smaller than the needed number of buffers, the mutable hash table will calculate the bucket count {code} bucketCount = (int) (((long) totalBuffersAvailable) * RECORD_TABLE_BYTES / (avgRecordLenPartition + RECORD_OVERHEAD_BYTES)); {code} If the average record length is sufficiently large, then the bucket count will be 0. Initializing the hash table with a 0 bucket count will cause then the division by 0 exception. I don't know whether this problem can be mitigated but it should at least throw a meaningful exception instead of the ArithmeticException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-509) Move virtual machines building process / hosting to Amazon S3 / EC2
[ https://issues.apache.org/jira/browse/FLINK-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-509. Resolution: Invalid Fix Version/s: (was: pre-apache) No virtual machine images are built by the Flink Apache infrastructure. Docker support has been added. Move virtual machines building process / hosting to Amazon S3 / EC2 --- Key: FLINK-509 URL: https://issues.apache.org/jira/browse/FLINK-509 Project: Flink Issue Type: Improvement Reporter: GitHub Import Priority: Minor Labels: github-import The virtual machine images are currently hosted at on a very unreliable server. I'd be happy if someone could come up with a more cost-efficient solution than Amazon. But we need a reliable solution. * The hosting is unreliable * the automated build process stopped for some reason * there is no VM for the 0.4 release * there is no? error reporting Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/509 Created by: [rmetzger|https://github.com/rmetzger] Labels: bug, build system, website, Milestone: Release 0.6 (unplanned) Created at: Wed Feb 26 10:49:33 CET 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-509) Move virtual machines building process / hosting to Amazon S3 / EC2
[ https://issues.apache.org/jira/browse/FLINK-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-509. -- Move virtual machines building process / hosting to Amazon S3 / EC2 --- Key: FLINK-509 URL: https://issues.apache.org/jira/browse/FLINK-509 Project: Flink Issue Type: Improvement Reporter: GitHub Import Priority: Minor Labels: github-import The virtual machine images are currently hosted at on a very unreliable server. I'd be happy if someone could come up with a more cost-efficient solution than Amazon. But we need a reliable solution. * The hosting is unreliable * the automated build process stopped for some reason * there is no VM for the 0.4 release * there is no? error reporting Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/509 Created by: [rmetzger|https://github.com/rmetzger] Labels: bug, build system, website, Milestone: Release 0.6 (unplanned) Created at: Wed Feb 26 10:49:33 CET 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
[ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660331#comment-14660331 ] ASF GitHub Bot commented on FLINK-2240: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-128441993 This was a super cool contribution. A pretty sophisticated addition, super testing, high code quality. I am very impressed! I hope you will contribute more to Flink. Already saw that you opened another pull request, for a sampling operator. Happy that this is happening :-) In the future, I can hopefully review and merge the pull requests faster. The past weeks, I did not get to code work as much as I wanted, and the list of critical issues was long, so this pull request got delayed a bit. Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join Key: FLINK-2240 URL: https://issues.apache.org/jira/browse/FLINK-2240 Project: Flink Issue Type: Improvement Components: Core Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Minor In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data would be spilled to disk, and the counterpart partition of big table data would be spilled to disk in probe phase as well. If we build a BloomFilter while spill small table to disk during build phase, and use it to filter the big table records which tend to be spilled to disk, this may greatly reduce the spilled big table file size, and saved the disk IO cost for writing and further reading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2452) Add a playcount threshold to the MusicProfiles example
[ https://issues.apache.org/jira/browse/FLINK-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660454#comment-14660454 ] ASF GitHub Bot commented on FLINK-2452: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/968#issuecomment-128456801 Nope, no comment. Add a playcount threshold to the MusicProfiles example -- Key: FLINK-2452 URL: https://issues.apache.org/jira/browse/FLINK-2452 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 0.10 Reporter: Vasia Kalavri Assignee: Vasia Kalavri Priority: Minor In the MusicProfiles example, when creating the user-user similarity graph, an edge is created between any 2 users that have listened to the same song (even if once). Depending on the input data, this might produce a projection graph with many more edges than the original user-song graph. To make this computation more efficient, this issue proposes adding a user-defined parameter that filters out songs that a user has listened to only a few times. Essentially, it is a threshold for playcount, above which a user is considered to like a song. For reference, with a threshold value of 30, the whole Last.fm dataset is analyzed on my laptop in a few minutes, while no threshold results in a runtime of several hours. There are many solutions to this problem, but since this is just an example (not a library method), I think that keeping it simple is important. Thanks to [~andralungu] for spotting the inefficiency! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-848) Move combine() from GroupReduceFunction to Interface
[ https://issues.apache.org/jira/browse/FLINK-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-848. -- Move combine() from GroupReduceFunction to Interface Key: FLINK-848 URL: https://issues.apache.org/jira/browse/FLINK-848 Project: Flink Issue Type: Sub-task Components: Java API Reporter: Fabian Hueske Assignee: Kostas Tzoumas Labels: breaking-api, github-import Fix For: 0.7.1-incubating Currently, the combine method of the GroupReduceFunction allows to return multiple values using a collector. However, most combiners do not need this because they return only a single value. Furthermore, a single value returning combiner can be executed using more efficient hash-based strategies. Hence, we propose to introduce a combine method for GroupReduce which returns only a single value. In order to keep support for the rare cases where more than one value needs to be returned, we want to keep the collector-combiner as well. To do so, we could remove the combine method from the abstract GroupReduceFunction class and add two Combinable interfaces, one for a single-value and one for a multi-value combiner. This would also make the Combinable annotation obsolete as the optimizer can check whether a GroupReduceFunction implements one of the Combinable interfaces or not. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/848 Created by: [fhueske|https://github.com/fhueske] Labels: core, enhancement, Milestone: Release 0.6 (unplanned) Created at: Thu May 22 10:23:04 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2452] [Gelly] adds a playcount threshol...
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/968#issuecomment-128456801 Nope, no comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-848) Move combine() from GroupReduceFunction to Interface
[ https://issues.apache.org/jira/browse/FLINK-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-848. Resolution: Fixed Fix Version/s: (was: pre-apache) 0.7.1-incubating Has been moved to interfaces {{CombineFunction}} and {{GroupCombineFunction}} Move combine() from GroupReduceFunction to Interface Key: FLINK-848 URL: https://issues.apache.org/jira/browse/FLINK-848 Project: Flink Issue Type: Sub-task Components: Java API Reporter: Fabian Hueske Assignee: Kostas Tzoumas Labels: breaking-api, github-import Fix For: 0.7.1-incubating Currently, the combine method of the GroupReduceFunction allows to return multiple values using a collector. However, most combiners do not need this because they return only a single value. Furthermore, a single value returning combiner can be executed using more efficient hash-based strategies. Hence, we propose to introduce a combine method for GroupReduce which returns only a single value. In order to keep support for the rare cases where more than one value needs to be returned, we want to keep the collector-combiner as well. To do so, we could remove the combine method from the abstract GroupReduceFunction class and add two Combinable interfaces, one for a single-value and one for a multi-value combiner. This would also make the Combinable annotation obsolete as the optimizer can check whether a GroupReduceFunction implements one of the Combinable interfaces or not. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/848 Created by: [fhueske|https://github.com/fhueske] Labels: core, enhancement, Milestone: Release 0.6 (unplanned) Created at: Thu May 22 10:23:04 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-701) Change new Java API functions to SAMs
[ https://issues.apache.org/jira/browse/FLINK-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-701. -- Change new Java API functions to SAMs - Key: FLINK-701 URL: https://issues.apache.org/jira/browse/FLINK-701 Project: Flink Issue Type: Improvement Components: Java API Reporter: GitHub Import Assignee: Kostas Tzoumas Labels: github-import Fix For: 0.6-incubating In order to support a compact syntax with Java 8 Lambdas, we would need to change the types of the functions to Single Abstract Method types (SAMs). Only those can be implemented by Lambdas. That means that DataSet.map(MapFunction) would accept an interface MapFunction, not the abstract class that we use now. Many UDFs would not inherit form `AbstractFunction` any more. The inheritance from AbstractFunction would be optional, if life cycle methods (open / close) and runtime contexts are needed. This may have also implications on the type extraction, as the generic parameters are in generic superinterfaces, rather than in generic superclasses. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/701 Created by: [StephanEwen|https://github.com/StephanEwen] Labels: enhancement, java api, user satisfaction, Milestone: Release 0.6 (unplanned) Created at: Thu Apr 17 13:06:40 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-967) Make intermediate results a first-class citizen in the JobGraph
[ https://issues.apache.org/jira/browse/FLINK-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-967. -- Make intermediate results a first-class citizen in the JobGraph --- Key: FLINK-967 URL: https://issues.apache.org/jira/browse/FLINK-967 Project: Flink Issue Type: New Feature Components: JobManager, TaskManager Affects Versions: 0.6-incubating Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.9 In order to add incremental plan rollout to the system, we need to make intermediate results a first-class citizen in the job graph and scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-981) Support for generated Cloudera Hadoop configuration
[ https://issues.apache.org/jira/browse/FLINK-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660484#comment-14660484 ] Stephan Ewen commented on FLINK-981: Is this still valid, or is this fixed already by properly loading the Hadoop configuration? Support for generated Cloudera Hadoop configuration Key: FLINK-981 URL: https://issues.apache.org/jira/browse/FLINK-981 Project: Flink Issue Type: Bug Components: Local Runtime, YARN Client Reporter: Robert Metzger Cloudera Hadoop generates configuration files that different from the vanilla upstream Hadoop configuration files. The HDFS and the YARN component both access configuration values from Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1116) Packaged Scala Examples do not work due to missing test data
[ https://issues.apache.org/jira/browse/FLINK-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-1116. - Resolution: Invalid Scala examples are not packaged any more (redundancy with java examples) Packaged Scala Examples do not work due to missing test data Key: FLINK-1116 URL: https://issues.apache.org/jira/browse/FLINK-1116 Project: Flink Issue Type: Bug Components: Scala API Reporter: Stephan Ewen Priority: Minor The example data classes are in the java examples project. The maven jar plugin cannot include them into the jars of the Scala examples, causing the examples to fail with a ClassNotFoundException when staring the example job. For now, I disabled the Scala example jars from being built, because they do not work anyways. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1116) Packaged Scala Examples do not work due to missing test data
[ https://issues.apache.org/jira/browse/FLINK-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-1116. --- Packaged Scala Examples do not work due to missing test data Key: FLINK-1116 URL: https://issues.apache.org/jira/browse/FLINK-1116 Project: Flink Issue Type: Bug Components: Scala API Reporter: Stephan Ewen Priority: Minor The example data classes are in the java examples project. The maven jar plugin cannot include them into the jars of the Scala examples, causing the examples to fail with a ClassNotFoundException when staring the example job. For now, I disabled the Scala example jars from being built, because they do not work anyways. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1083) WebInterface improvements
[ https://issues.apache.org/jira/browse/FLINK-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-1083. --- WebInterface improvements - Key: FLINK-1083 URL: https://issues.apache.org/jira/browse/FLINK-1083 Project: Flink Issue Type: Improvement Components: JobManager Reporter: Jonathan Hasenburg New Issue to summarize all things that should be done regarding the webinterface. * rework dashboard in a way that more than one job can be shown ... . If a job is clicked you get to the details. * DONE: add history to dashboard * Running jobs should get a view like jobs in the history. * DONE: rework the menu and try to add some links to the dashboard (like the taskmanager section) * improve the way the jsons are send to the webinterface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1135) Blog post with topic Accessing Data Stored in Hive with Flink
[ https://issues.apache.org/jira/browse/FLINK-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660516#comment-14660516 ] Stephan Ewen commented on FLINK-1135: - I this still going to happen? Blog post with topic Accessing Data Stored in Hive with Flink --- Key: FLINK-1135 URL: https://issues.apache.org/jira/browse/FLINK-1135 Project: Flink Issue Type: Improvement Components: Project Website Reporter: Timo Walther Assignee: Robert Metzger Priority: Minor Attachments: 2014-09-29-querying-hive.md Recently, I implemented a Flink job that accessed Hive. Maybe someone else is going to try this. I created a blog post for the website to share my experience. You'll find the blog post file attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
[ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2240. - Resolution: Implemented Fix Version/s: 0.10 Implemented in 61dcae391cb3b45ba3aff47d4d9163889d2958a4 Thank you for the contribution! Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join Key: FLINK-2240 URL: https://issues.apache.org/jira/browse/FLINK-2240 Project: Flink Issue Type: Improvement Components: Core Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Minor Fix For: 0.10 In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data would be spilled to disk, and the counterpart partition of big table data would be spilled to disk in probe phase as well. If we build a BloomFilter while spill small table to disk during build phase, and use it to filter the big table records which tend to be spilled to disk, this may greatly reduce the spilled big table file size, and saved the disk IO cost for writing and further reading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
[ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2240. --- Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join Key: FLINK-2240 URL: https://issues.apache.org/jira/browse/FLINK-2240 Project: Flink Issue Type: Improvement Components: Core Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Minor Fix For: 0.10 In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data would be spilled to disk, and the counterpart partition of big table data would be spilled to disk in probe phase as well. If we build a BloomFilter while spill small table to disk during build phase, and use it to filter the big table records which tend to be spilled to disk, this may greatly reduce the spilled big table file size, and saved the disk IO cost for writing and further reading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)