[jira] [Commented] (FLINK-685) Add support for semi-joins

2021-04-22 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17329581#comment-17329581
 ] 

Flink Jira Bot commented on FLINK-685:
--

This issue is assigned but has not received an update in 7 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Add support for semi-joins
> --
>
> Key: FLINK-685
> URL: https://issues.apache.org/jira/browse/FLINK-685
> Project: Flink
>  Issue Type: New Feature
>  Components: API / DataSet
>Reporter: GitHub Import
>Assignee: pietro pinoli
>Priority: Minor
>  Labels: github-import, stale-assigned, stale-minor
>
> A semi-join is basically a join filter. One input is "filtering" and the 
> other one is "filtered".
> A tuple of the "filtered" input is emitted exactly once if the "filtering" 
> input has one (ore more) tuples with matching join keys. That means that the 
> output of a semi-join has the same type as the "filtered" input and the 
> "filtering" input is completely discarded.
> In order to support a semi-join, we need to add an additional physical 
> execution strategy, that ensures, that a tuple of the "filtered" input is 
> emitted only once if the "filtering" input has more than one tuple with 
> matching keys. Furthermore, a couple of optimizations compared to standard 
> joins can be done such as storing only keys and not the full tuple of the 
> "filtering" input in a hash table.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/685
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, java api, runtime, 
> Milestone: Release 0.6 (unplanned)
> Created at: Mon Apr 14 12:05:29 CEST 2014
> State: open



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-685) Add support for semi-joins

2021-04-14 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321788#comment-17321788
 ] 

Flink Jira Bot commented on FLINK-685:
--

This issue and all of its Sub-Tasks have not been updated for 180 days. So, it 
has been labeled "stale-minor". If you are still affected by this bug or are 
still interested in this issue, please give an update and remove the label. In 
7 days the issue will be closed automatically.

> Add support for semi-joins
> --
>
> Key: FLINK-685
> URL: https://issues.apache.org/jira/browse/FLINK-685
> Project: Flink
>  Issue Type: New Feature
>  Components: API / DataSet
>Reporter: GitHub Import
>Assignee: pietro pinoli
>Priority: Minor
>  Labels: github-import, stale-minor
>
> A semi-join is basically a join filter. One input is "filtering" and the 
> other one is "filtered".
> A tuple of the "filtered" input is emitted exactly once if the "filtering" 
> input has one (ore more) tuples with matching join keys. That means that the 
> output of a semi-join has the same type as the "filtered" input and the 
> "filtering" input is completely discarded.
> In order to support a semi-join, we need to add an additional physical 
> execution strategy, that ensures, that a tuple of the "filtered" input is 
> emitted only once if the "filtering" input has more than one tuple with 
> matching keys. Furthermore, a couple of optimizations compared to standard 
> joins can be done such as storing only keys and not the full tuple of the 
> "filtering" input in a hash table.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/685
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, java api, runtime, 
> Milestone: Release 0.6 (unplanned)
> Created at: Mon Apr 14 12:05:29 CEST 2014
> State: open



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-685) Add support for semi-joins

2015-07-22 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636714#comment-14636714
 ] 

Stephan Ewen commented on FLINK-685:


So far, we have had the paradigm that any operation in Flink's batch API must 
be size safe, so it has to scale to groups beyond main memory without an 
{{OutOfMemoryError}}.

This put a bit of a hurdle on implementing certain operations.

How about we start a class for extended operations, where we add simpler 
implementations? We would not give these guarantees to handle super large 
groups, but that is probably still good enough for a big part of the use cases. 
We could add your implementation to that class of extended operations.

 Add support for semi-joins
 --

 Key: FLINK-685
 URL: https://issues.apache.org/jira/browse/FLINK-685
 Project: Flink
  Issue Type: New Feature
Reporter: GitHub Import
Assignee: pietro pinoli
Priority: Minor
  Labels: github-import
 Fix For: pre-apache


 A semi-join is basically a join filter. One input is filtering and the 
 other one is filtered.
 A tuple of the filtered input is emitted exactly once if the filtering 
 input has one (ore more) tuples with matching join keys. That means that the 
 output of a semi-join has the same type as the filtered input and the 
 filtering input is completely discarded.
 In order to support a semi-join, we need to add an additional physical 
 execution strategy, that ensures, that a tuple of the filtered input is 
 emitted only once if the filtering input has more than one tuple with 
 matching keys. Furthermore, a couple of optimizations compared to standard 
 joins can be done such as storing only keys and not the full tuple of the 
 filtering input in a hash table.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/685
 Created by: [fhueske|https://github.com/fhueske]
 Labels: enhancement, java api, runtime, 
 Milestone: Release 0.6 (unplanned)
 Created at: Mon Apr 14 12:05:29 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-685) Add support for semi-joins

2015-07-20 Thread pietro pinoli (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634358#comment-14634358
 ] 

pietro pinoli commented on FLINK-685:
-

Hi [~fhueske],

then if I implement the dummy version  (your first alternative) does it have 
some chances to get merged ?:)

Thanks again for your time.
PP

 Add support for semi-joins
 --

 Key: FLINK-685
 URL: https://issues.apache.org/jira/browse/FLINK-685
 Project: Flink
  Issue Type: New Feature
Reporter: GitHub Import
Priority: Minor
  Labels: github-import
 Fix For: pre-apache


 A semi-join is basically a join filter. One input is filtering and the 
 other one is filtered.
 A tuple of the filtered input is emitted exactly once if the filtering 
 input has one (ore more) tuples with matching join keys. That means that the 
 output of a semi-join has the same type as the filtered input and the 
 filtering input is completely discarded.
 In order to support a semi-join, we need to add an additional physical 
 execution strategy, that ensures, that a tuple of the filtered input is 
 emitted only once if the filtering input has more than one tuple with 
 matching keys. Furthermore, a couple of optimizations compared to standard 
 joins can be done such as storing only keys and not the full tuple of the 
 filtering input in a hash table.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/685
 Created by: [fhueske|https://github.com/fhueske]
 Labels: enhancement, java api, runtime, 
 Milestone: Release 0.6 (unplanned)
 Created at: Mon Apr 14 12:05:29 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-685) Add support for semi-joins

2015-07-16 Thread pietro pinoli (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630528#comment-14630528
 ] 

pietro pinoli commented on FLINK-685:
-

Dear [~fhueske], are you (community) still interested in this?

I am developing a project for genomics which relies on Flink, and having such 
feature may help us a lot.

I would love to take it, but I guess I'll need some help from you  (or some 
other guy), so as to get started:)

Let me know.
Thanks,
Pietro.



 Add support for semi-joins
 --

 Key: FLINK-685
 URL: https://issues.apache.org/jira/browse/FLINK-685
 Project: Flink
  Issue Type: New Feature
Reporter: GitHub Import
Priority: Minor
  Labels: github-import
 Fix For: pre-apache


 A semi-join is basically a join filter. One input is filtering and the 
 other one is filtered.
 A tuple of the filtered input is emitted exactly once if the filtering 
 input has one (ore more) tuples with matching join keys. That means that the 
 output of a semi-join has the same type as the filtered input and the 
 filtering input is completely discarded.
 In order to support a semi-join, we need to add an additional physical 
 execution strategy, that ensures, that a tuple of the filtered input is 
 emitted only once if the filtering input has more than one tuple with 
 matching keys. Furthermore, a couple of optimizations compared to standard 
 joins can be done such as storing only keys and not the full tuple of the 
 filtering input in a hash table.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/685
 Created by: [fhueske|https://github.com/fhueske]
 Labels: enhancement, java api, runtime, 
 Milestone: Release 0.6 (unplanned)
 Created at: Mon Apr 14 12:05:29 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)