RE: Nested UDFs

2016-11-17 Thread Mendelson, Assaf
)) The second option is more correct and should provide better performance. From: Perttu Ranta-aho [mailto:ranta...@iki.fi] Sent: Thursday, November 17, 2016 1:50 PM To: user@spark.apache.org Subject: Re: Nested UDFs Hi, My example was little bogus, my real use case is to do multiple regexp

Re: Nested UDFs

2016-11-17 Thread Perttu Ranta-aho
Hi, My example was little bogus, my real use case is to do multiple regexp replacements so something like: def my_f(data): for match, repl in regexp_list: data = regexp_replace(match, repl, data) return data I could achieve my goal by mutiple .select(regexp_replace()) lines, but

RE: Nested UDFs

2016-11-16 Thread Mendelson, Assaf
Regexp_replace is supposed to receive a column, you don’t need to write a UDF for it. Instead try: Test_data.select(regexp_Replace(test_data.name, ‘a’, ‘X’) You would need a Udf if you would wanted to do something on the string value of a single row (e.g. return data + “bla”) Assaf. From: