Re: How to make batch filter

Bitfox Sat, 01 Jan 2022 16:20:34 -0800

One more question, for this big filter, given my server has 4 Cores, will
spark (standalone mode) split the RDD to 4 partitions automatically?


Thanks

On Sun, Jan 2, 2022 at 6:30 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Create a list of values that you don't want anf filter oon those
>
> >>> DF = spark.range(10)
> >>> DF
> DataFrame[id: bigint]
> >>>
> >>> array = [1, 2, 3, 8]  # don't want these
> >>> DF.filter(DF.id.isin(array) == False).show()
> +---+
> | id|
> +---+
> |  0|
> |  4|
> |  5|
> |  6|
> |  7|
> |  9|
> +---+
>
>  or use binary NOT operator:
>
>
> >>> DF.filter(*~*DF.id.isin(array)).show()
>
> +---+
>
> | id|
>
> +---+
>
> |  0|
>
> |  4|
>
> |  5|
>
> |  6|
>
> |  7|
>
> |  9|
>
> +---+
>
>
> HTH
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 1 Jan 2022 at 20:59, Bitfox <bit...@bitfox.top> wrote:
>
>> Using the dataframe API I need to implement a batch filter:
>>
>> DF. select(..).where(col(..) != ‘a’ and col(..) != ‘b’ and …)
>>
>> There are a lot of keywords should be filtered for the same column in
>> where statement.
>>
>> How can I make it more smater? UDF or others?
>>
>> Thanks & Happy new Year!
>> Bitfox
>>
>

Re: How to make batch filter

Reply via email to