https://github.com/apache/spark/pull/9055
This JIRA explains how to convert IN to Joins. Thanks, Xiao Li 2015-12-04 11:27 GMT-08:00 Michael Armbrust <mich...@databricks.com>: > The best way to run this today is probably to manually convert the query > into a join. I.e. create a dataframe that has all the numbers in it, and > join/outer join it with the other table. This way you avoid parsing a > gigantic string. > > On Fri, Dec 4, 2015 at 10:36 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Have you seen this JIRA ? >> >> [SPARK-8077] [SQL] Optimization for TreeNodes with large numbers of >> children >> >> From the numbers Michael published, 1 million numbers would still need >> 250 seconds to parse. >> >> On Fri, Dec 4, 2015 at 10:14 AM, Madabhattula Rajesh Kumar < >> mrajaf...@gmail.com> wrote: >> >>> Hi, >>> >>> How to use/best practices "IN" clause in Spark SQL. >>> >>> Use Case :- Read the table based on number. I have a List of numbers. >>> For example, 1million. >>> >>> Regards, >>> Rajesh >>> >> >> >