Re: How to make batch filter

2022-01-01 Thread Bitfox
One more question, for this big filter, given my server has 4 Cores, will spark (standalone mode) split the RDD to 4 partitions automatically? Thanks On Sun, Jan 2, 2022 at 6:30 AM Mich Talebzadeh wrote: > Create a list of values that you don't want anf filter oon those > > >>> DF =

Re: How to make batch filter

2022-01-01 Thread Bitfox
That’s great thanks. On Sun, Jan 2, 2022 at 6:30 AM Mich Talebzadeh wrote: > Create a list of values that you don't want anf filter oon those > > >>> DF = spark.range(10) > >>> DF > DataFrame[id: bigint] > >>> > >>> array = [1, 2, 3, 8] # don't want these > >>> DF.filter(DF.id.isin(array) ==

Re: How to make batch filter

2022-01-01 Thread Mich Talebzadeh
Create a list of values that you don't want anf filter oon those >>> DF = spark.range(10) >>> DF DataFrame[id: bigint] >>> >>> array = [1, 2, 3, 8] # don't want these >>> DF.filter(DF.id.isin(array) == False).show() +---+ | id| +---+ | 0| | 4| | 5| | 6| | 7| | 9| +---+ or use binary NOT

How to make batch filter

2022-01-01 Thread Bitfox
Using the dataframe API I need to implement a batch filter: DF. select(..).where(col(..) != ‘a’ and col(..) != ‘b’ and …) There are a lot of keywords should be filtered for the same column in where statement. How can I make it more smater? UDF or others? Thanks & Happy new Year! Bitfox