[ https://issues.apache.org/jira/browse/SPARK-19731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884904#comment-15884904 ]
Shawn Lavelle edited comment on SPARK-19731 at 2/26/17 8:47 PM: ---------------------------------------------------------------- I recognize the possibility of an ambiguity problem, but I can't fully articulate that it isn't manageable for the case of "col/val<of type X> in array<of type X>". The follow seems like they should work: select * from data where key in array(1,2,3); select * from data where key in udf_returns_array(); Certainly, the array_contains does work as expected: select * from data where array_contains(array(1,2,3), key); I might be just feeling the affects of [SPARK-19730|https://issues.apache.org/jira/browse/SPARK-19730] though. The data store implementation I have rejects queries if primary keys aren't provided - so in this situation, the query is rejected as neither the array_contains nor predicate subquery provide the filter on "key" to the outer query. was (Author: azeroth2b): I recognize the possibility of an ambiguity problem, but I can't fully articulate that it isn't manageable for the case of "col/val<of type X> in array<of type X>". The follow seems like they should work: select * from data where key in array(1,2,3); select * from data where key in udf_returns_array(); Certainly, the array_contains does work as expected: select * from data where array_contains(array(1,2,3), key); I might be just feeling the affects of [SPARK-19730|https://issues.apache.org/jira/browse/SPARK-19730] though. The data store implementation I have rejects queries if primary keys aren't provided - so in this situation, the query is rejected as neither the array_contains nor predicate subquery provide the filter on "key". > IN Operator should support arrays > --------------------------------- > > Key: SPARK-19731 > URL: https://issues.apache.org/jira/browse/SPARK-19731 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 1.6.2, 2.0.0, 2.1.0 > Reporter: Shawn Lavelle > Priority: Minor > > When the column type and array member type match, the IN operator should > still operate on the array. This is useful for UDFs and Predicate SubQueries > that return arrays. > (This isn't necessarily extensible to all collections, but certainly applies > to arrays.) > Example: > select 5 in array(1,2,3) Should return false instead of parseException, since > the type of the array and the type of the column match. > create table test (val int); > insert into test values (1); > select * from test; > +------+--+ > | val | > +------+--+ > | 1 | > +------+--+ > *select val from test where array_contains(array(1,2,3), val);* > +------+--+ > | val | > +------+--+ > | 1 | > +------+--+ > {panel} > *select val from test where val in (array(1,2,3));* > Error: org.apache.spark.sql.AnalysisException: cannot resolve '(test.`val` IN > (array(1, 2, 3)))' due to data type mismatch: Arguments must be same type; > line 1 pos 31; > 'Project ['val] > +- 'Filter val#433 IN (array(1, 2, 3)) > +- MetastoreRelation test (state=,code=0) > {panel} > {panel} > *select val from test where val in (select array(1,2,3));* > Error: org.apache.spark.sql.AnalysisException: cannot resolve '(test.`val` = > `array(1, 2, 3)`)' due to data type mismatch: differing types in '(test.`val` > = `array(1, 2, 3)`)' (int and array<int>).;; > 'Project ['val] > +- 'Filter predicate-subquery#434 [(val#435 = array(1, 2, 3)#436)] > : +- Project [array(1, 2, 3) AS array(1, 2, 3)#436] > : +- OneRowRelation$ > +- MetastoreRelation test (state=,code=0) > {panel} > {panel} > *select val from test where val in (select explode(array(1,2,3)));* > +------+--+ > | val | > +------+--+ > | 1 | > +------+--+ > Note: See [SPARK-19730|https://issues.apache.org/jira/browse/SPARK-19730] for > how a predicate subquery breaks when applied to the DataSourceAPI > {panel} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org