[jira] [Commented] (FLINK-685) Add support for semi-joins
[ https://issues.apache.org/jira/browse/FLINK-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17329581#comment-17329581 ] Flink Jira Bot commented on FLINK-685: -- This issue is assigned but has not received an update in 7 days so it has been labeled "stale-assigned". If you are still working on the issue, please give an update and remove the label. If you are no longer working on the issue, please unassign so someone else may work on it. In 7 days the issue will be automatically unassigned. > Add support for semi-joins > -- > > Key: FLINK-685 > URL: https://issues.apache.org/jira/browse/FLINK-685 > Project: Flink > Issue Type: New Feature > Components: API / DataSet >Reporter: GitHub Import >Assignee: pietro pinoli >Priority: Minor > Labels: github-import, stale-assigned, stale-minor > > A semi-join is basically a join filter. One input is "filtering" and the > other one is "filtered". > A tuple of the "filtered" input is emitted exactly once if the "filtering" > input has one (ore more) tuples with matching join keys. That means that the > output of a semi-join has the same type as the "filtered" input and the > "filtering" input is completely discarded. > In order to support a semi-join, we need to add an additional physical > execution strategy, that ensures, that a tuple of the "filtered" input is > emitted only once if the "filtering" input has more than one tuple with > matching keys. Furthermore, a couple of optimizations compared to standard > joins can be done such as storing only keys and not the full tuple of the > "filtering" input in a hash table. > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/685 > Created by: [fhueske|https://github.com/fhueske] > Labels: enhancement, java api, runtime, > Milestone: Release 0.6 (unplanned) > Created at: Mon Apr 14 12:05:29 CEST 2014 > State: open -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-685) Add support for semi-joins
[ https://issues.apache.org/jira/browse/FLINK-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321788#comment-17321788 ] Flink Jira Bot commented on FLINK-685: -- This issue and all of its Sub-Tasks have not been updated for 180 days. So, it has been labeled "stale-minor". If you are still affected by this bug or are still interested in this issue, please give an update and remove the label. In 7 days the issue will be closed automatically. > Add support for semi-joins > -- > > Key: FLINK-685 > URL: https://issues.apache.org/jira/browse/FLINK-685 > Project: Flink > Issue Type: New Feature > Components: API / DataSet >Reporter: GitHub Import >Assignee: pietro pinoli >Priority: Minor > Labels: github-import, stale-minor > > A semi-join is basically a join filter. One input is "filtering" and the > other one is "filtered". > A tuple of the "filtered" input is emitted exactly once if the "filtering" > input has one (ore more) tuples with matching join keys. That means that the > output of a semi-join has the same type as the "filtered" input and the > "filtering" input is completely discarded. > In order to support a semi-join, we need to add an additional physical > execution strategy, that ensures, that a tuple of the "filtered" input is > emitted only once if the "filtering" input has more than one tuple with > matching keys. Furthermore, a couple of optimizations compared to standard > joins can be done such as storing only keys and not the full tuple of the > "filtering" input in a hash table. > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/685 > Created by: [fhueske|https://github.com/fhueske] > Labels: enhancement, java api, runtime, > Milestone: Release 0.6 (unplanned) > Created at: Mon Apr 14 12:05:29 CEST 2014 > State: open -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-685) Add support for semi-joins
[ https://issues.apache.org/jira/browse/FLINK-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636714#comment-14636714 ] Stephan Ewen commented on FLINK-685: So far, we have had the paradigm that any operation in Flink's batch API must be size safe, so it has to scale to groups beyond main memory without an {{OutOfMemoryError}}. This put a bit of a hurdle on implementing certain operations. How about we start a class for extended operations, where we add simpler implementations? We would not give these guarantees to handle super large groups, but that is probably still good enough for a big part of the use cases. We could add your implementation to that class of extended operations. Add support for semi-joins -- Key: FLINK-685 URL: https://issues.apache.org/jira/browse/FLINK-685 Project: Flink Issue Type: New Feature Reporter: GitHub Import Assignee: pietro pinoli Priority: Minor Labels: github-import Fix For: pre-apache A semi-join is basically a join filter. One input is filtering and the other one is filtered. A tuple of the filtered input is emitted exactly once if the filtering input has one (ore more) tuples with matching join keys. That means that the output of a semi-join has the same type as the filtered input and the filtering input is completely discarded. In order to support a semi-join, we need to add an additional physical execution strategy, that ensures, that a tuple of the filtered input is emitted only once if the filtering input has more than one tuple with matching keys. Furthermore, a couple of optimizations compared to standard joins can be done such as storing only keys and not the full tuple of the filtering input in a hash table. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/685 Created by: [fhueske|https://github.com/fhueske] Labels: enhancement, java api, runtime, Milestone: Release 0.6 (unplanned) Created at: Mon Apr 14 12:05:29 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-685) Add support for semi-joins
[ https://issues.apache.org/jira/browse/FLINK-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634358#comment-14634358 ] pietro pinoli commented on FLINK-685: - Hi [~fhueske], then if I implement the dummy version (your first alternative) does it have some chances to get merged ?:) Thanks again for your time. PP Add support for semi-joins -- Key: FLINK-685 URL: https://issues.apache.org/jira/browse/FLINK-685 Project: Flink Issue Type: New Feature Reporter: GitHub Import Priority: Minor Labels: github-import Fix For: pre-apache A semi-join is basically a join filter. One input is filtering and the other one is filtered. A tuple of the filtered input is emitted exactly once if the filtering input has one (ore more) tuples with matching join keys. That means that the output of a semi-join has the same type as the filtered input and the filtering input is completely discarded. In order to support a semi-join, we need to add an additional physical execution strategy, that ensures, that a tuple of the filtered input is emitted only once if the filtering input has more than one tuple with matching keys. Furthermore, a couple of optimizations compared to standard joins can be done such as storing only keys and not the full tuple of the filtering input in a hash table. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/685 Created by: [fhueske|https://github.com/fhueske] Labels: enhancement, java api, runtime, Milestone: Release 0.6 (unplanned) Created at: Mon Apr 14 12:05:29 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-685) Add support for semi-joins
[ https://issues.apache.org/jira/browse/FLINK-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630528#comment-14630528 ] pietro pinoli commented on FLINK-685: - Dear [~fhueske], are you (community) still interested in this? I am developing a project for genomics which relies on Flink, and having such feature may help us a lot. I would love to take it, but I guess I'll need some help from you (or some other guy), so as to get started:) Let me know. Thanks, Pietro. Add support for semi-joins -- Key: FLINK-685 URL: https://issues.apache.org/jira/browse/FLINK-685 Project: Flink Issue Type: New Feature Reporter: GitHub Import Priority: Minor Labels: github-import Fix For: pre-apache A semi-join is basically a join filter. One input is filtering and the other one is filtered. A tuple of the filtered input is emitted exactly once if the filtering input has one (ore more) tuples with matching join keys. That means that the output of a semi-join has the same type as the filtered input and the filtering input is completely discarded. In order to support a semi-join, we need to add an additional physical execution strategy, that ensures, that a tuple of the filtered input is emitted only once if the filtering input has more than one tuple with matching keys. Furthermore, a couple of optimizations compared to standard joins can be done such as storing only keys and not the full tuple of the filtering input in a hash table. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/685 Created by: [fhueske|https://github.com/fhueske] Labels: enhancement, java api, runtime, Milestone: Release 0.6 (unplanned) Created at: Mon Apr 14 12:05:29 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)