One more question, for this big filter, given my server has 4 Cores, will
spark (standalone mode) split the RDD to 4 partitions automatically?
Thanks
On Sun, Jan 2, 2022 at 6:30 AM Mich Talebzadeh
wrote:
> Create a list of values that you don't want anf filter oon those
>
> >>> DF =
That’s great thanks.
On Sun, Jan 2, 2022 at 6:30 AM Mich Talebzadeh
wrote:
> Create a list of values that you don't want anf filter oon those
>
> >>> DF = spark.range(10)
> >>> DF
> DataFrame[id: bigint]
> >>>
> >>> array = [1, 2, 3, 8] # don't want these
> >>> DF.filter(DF.id.isin(array) ==
Create a list of values that you don't want anf filter oon those
>>> DF = spark.range(10)
>>> DF
DataFrame[id: bigint]
>>>
>>> array = [1, 2, 3, 8] # don't want these
>>> DF.filter(DF.id.isin(array) == False).show()
+---+
| id|
+---+
| 0|
| 4|
| 5|
| 6|
| 7|
| 9|
+---+
or use binary NOT
Using the dataframe API I need to implement a batch filter:
DF. select(..).where(col(..) != ‘a’ and col(..) != ‘b’ and …)
There are a lot of keywords should be filtered for the same column in where
statement.
How can I make it more smater? UDF or others?
Thanks & Happy new Year!
Bitfox