date:20230402

Re: Looping through a series of telephone numbers

2023-04-02 Thread Mich Talebzadeh

Hi Philippe, Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. They can be used, for example, to give every node a copy of a large input dataset in an efficient manner. Spark also attempts to distribute

Re: Looping through a series of telephone numbers

2023-04-02 Thread Philippe de Rochambeau

Hi Mich, what exactly do you mean by « if you prefer to broadcast the reference data »? Philippe > Le 2 avr. 2023 à 18:16, Mich Talebzadeh a écrit : > > Hi Phillipe, > > These are my thoughts besides comments from Sean > > Just to clarify, you receive a CSV file periodically and you already

Re: Looping through a series of telephone numbers

2023-04-02 Thread Philippe de Rochambeau

Wow, you guys, Anastasios, Bjørn and Mich, are stars! Thank you very much for your suggestions. I’m going to print them and study them closely. > Le 2 avr. 2023 à 20:05, Anastasios Zouzias a écrit : > > Hi Philippe, > > I would like to draw your attention to this great library that saved my

Re: Looping through a series of telephone numbers

2023-04-02 Thread Anastasios Zouzias

Hi Philippe, I would like to draw your attention to this great library that saved my day in the past when parsing phone numbers in Spark: https://github.com/google/libphonenumber If you combine it with Bjørn's suggestions you will have a good start on your linkage task. Best regards,

Re: Looping through a series of telephone numbers

2023-04-02 Thread Bjørn Jørgensen

dataset.csv id,tel_in_dataset 1,+33 2,+331222 3,+331333 4,+331222 5,+331222 6,+331444 7,+331222 8,+331555 telephone_numbers.csv tel +331222 +331222 +331222 +331222 start spark with all of yous cpu and ram import os import multiprocessing

Re: Looping through a series of telephone numbers

2023-04-02 Thread Mich Talebzadeh

Hi Phillipe, These are my thoughts besides comments from Sean Just to clarify, you receive a CSV file periodically and you already have a file that contains valid patterns for phone numbers (reference) In a pseudo language you can probe your csv DF against the reference DF // load your

Re: Looping through a series of telephone numbers

2023-04-02 Thread Sean Owen

That won't work, you can't use Spark within Spark like that. If it were exact matches, the best solution would be to load both datasets and join on telephone number. For this case, I think your best bet is a UDF that contains the telephone numbers as a list and decides whether a given number

Re: Looping through a series of telephone numbers

2023-04-02 Thread Philippe de Rochambeau

Many thanks, Mich. Is « foreach » the best construct to lookup items is a dataset such as the below « telephonedirectory » data set? val telrdd = spark.sparkContext.parallelize(Seq(« tel1 » , « tel2 » , « tel3 » …)) // the telephone sequence // was read for a CSV file val ds =

Re: Looping through a series of telephone numbers

Re: Looping through a series of telephone numbers

Re: Looping through a series of telephone numbers

Re: Looping through a series of telephone numbers

Re: Looping through a series of telephone numbers

Re: Looping through a series of telephone numbers

Re: Looping through a series of telephone numbers

Re: Looping through a series of telephone numbers

8 matches

Site Navigation

Mail list logo

Footer information