https://github.com/apache/spark/pull/9055

This JIRA explains how to convert IN to Joins.

Thanks,

Xiao Li



2015-12-04 11:27 GMT-08:00 Michael Armbrust <mich...@databricks.com>:

> The best way to run this today is probably to manually convert the query
> into a join.  I.e. create a dataframe that has all the numbers in it, and
> join/outer join it with the other table.  This way you avoid parsing a
> gigantic string.
>
> On Fri, Dec 4, 2015 at 10:36 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Have you seen this JIRA ?
>>
>> [SPARK-8077] [SQL] Optimization for TreeNodes with large numbers of
>> children
>>
>> From the numbers Michael published, 1 million numbers would still need
>> 250 seconds to parse.
>>
>> On Fri, Dec 4, 2015 at 10:14 AM, Madabhattula Rajesh Kumar <
>> mrajaf...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> How to use/best practices "IN" clause in Spark SQL.
>>>
>>> Use Case :-  Read the table based on number. I have a List of numbers.
>>> For example, 1million.
>>>
>>> Regards,
>>> Rajesh
>>>
>>
>>
>

Reply via email to