[ 
https://issues.apache.org/jira/browse/SPARK-19731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884904#comment-15884904
 ] 

Shawn Lavelle edited comment on SPARK-19731 at 2/26/17 8:47 PM:
----------------------------------------------------------------

I recognize the possibility of an ambiguity problem, but I can't fully 
articulate that it isn't manageable for the case of "col/val<of type X> in 
array<of type X>". 

The follow seems like they should work:
select * from data where key in array(1,2,3);
select * from data where key in udf_returns_array();

Certainly, the array_contains does work as expected:
select * from data where array_contains(array(1,2,3), key);

I might be just feeling the affects of 
[SPARK-19730|https://issues.apache.org/jira/browse/SPARK-19730] though. The 
data store implementation I have rejects queries if primary keys aren't 
provided - so in this situation, the query is rejected as neither the 
array_contains nor predicate subquery provide the filter on "key" to the outer 
query.



was (Author: azeroth2b):
I recognize the possibility of an ambiguity problem, but I can't fully 
articulate that it isn't manageable for the case of "col/val<of type X> in 
array<of type X>". 

The follow seems like they should work:
select * from data where key in array(1,2,3);
select * from data where key in udf_returns_array();

Certainly, the array_contains does work as expected:
select * from data where array_contains(array(1,2,3), key);

I might be just feeling the affects of 
[SPARK-19730|https://issues.apache.org/jira/browse/SPARK-19730] though. The 
data store implementation I have rejects queries if primary keys aren't 
provided - so in this situation, the query is rejected as neither the 
array_contains nor predicate subquery provide the filter on "key".


> IN Operator should support arrays
> ---------------------------------
>
>                 Key: SPARK-19731
>                 URL: https://issues.apache.org/jira/browse/SPARK-19731
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.2, 2.0.0, 2.1.0
>            Reporter: Shawn Lavelle
>            Priority: Minor
>
> When the column type and array member type match, the IN operator should 
> still operate on the array. This is useful for UDFs and Predicate SubQueries 
> that return arrays.  
> (This isn't necessarily extensible to all collections, but certainly applies 
> to arrays.)
> Example:
> select 5 in array(1,2,3) Should return false instead of parseException, since 
> the type of the array and the type of the column match.
> create table test (val int);
> insert into test values (1);
> select * from test;
> +------+--+
> | val  |
> +------+--+
> | 1    |
> +------+--+
> *select val from test where array_contains(array(1,2,3), val);*
> +------+--+
> | val  |
> +------+--+
> | 1    |
> +------+--+
> {panel}
> *select val from test where val in (array(1,2,3));*
> Error: org.apache.spark.sql.AnalysisException: cannot resolve '(test.`val` IN 
> (array(1, 2, 3)))' due to data type mismatch: Arguments must be same type; 
> line 1 pos 31;
> 'Project ['val]
> +- 'Filter val#433 IN (array(1, 2, 3))
>    +- MetastoreRelation test (state=,code=0)
> {panel}
> {panel}
> *select val from test where val in (select array(1,2,3));*
> Error: org.apache.spark.sql.AnalysisException: cannot resolve '(test.`val` = 
> `array(1, 2, 3)`)' due to data type mismatch: differing types in '(test.`val` 
> = `array(1, 2, 3)`)' (int and array<int>).;;
> 'Project ['val]
> +- 'Filter predicate-subquery#434 [(val#435 = array(1, 2, 3)#436)]
>    :  +- Project [array(1, 2, 3) AS array(1, 2, 3)#436]
>    :     +- OneRowRelation$
>    +- MetastoreRelation test (state=,code=0)
> {panel}
> {panel}
> *select val from test where val in (select explode(array(1,2,3)));*
> +------+--+
> | val  |
> +------+--+
> | 1    |
> +------+--+
> Note: See [SPARK-19730|https://issues.apache.org/jira/browse/SPARK-19730] for 
> how a predicate subquery breaks when applied to the DataSourceAPI
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to