Just out of curiosity, what would happen if you put your 10K values in to a 
temp table and then did a join against it? 

> On Apr 5, 2017, at 4:30 PM, Maciej Bryński <[email protected]> wrote:
> 
> Hi,
> I'm trying to run queries with many values in IN operator.
> 
> The result is that for more than 10K values IN operator is getting slower.
> 
> For example this code is running about 20 seconds.
> 
> df = spark.range(0,100000,1,1)
> df.where('id in ({})'.format(','.join(map(str,range(100000))))).count()
> 
> Any ideas how to improve this ?
> Is it a bug ?
> -- 
> Maciek Bryński
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Reply via email to