Re: Nested UDFs

Perttu Ranta-aho Thu, 17 Nov 2016 03:50:52 -0800

Hi,

My example was little bogus, my real use case is to do multiple regexp
replacements so something like:


def my_f(data):
    for match, repl in regexp_list:
       data = regexp_replace(match, repl, data)
    return data

I could achieve my goal by mutiple .select(regexp_replace()) lines, but one
UDF would be nicer.

-Perttu

to 17. marraskuuta 2016 klo 9.42 Mendelson, Assaf <assaf.mendel...@rsa.com>
kirjoitti:

> Regexp_replace is supposed to receive a column, you don’t need to write a
> UDF for it.
>
> Instead try:
>
> Test_data.select(regexp_Replace(test_data.name, ‘a’, ‘X’)
>
>
>
> You would need a Udf if you would wanted to do something on the string
> value of a single row (e.g. return data + “bla”)
>
>
>
> Assaf.
>
>
>
> *From:* Perttu Ranta-aho [mailto:ranta...@iki.fi]
> *Sent:* Thursday, November 17, 2016 9:15 AM
> *To:* user@spark.apache.org
> *Subject:* Nested UDFs
>
>
>
> Hi,
>
>
>
> Shouldn't this work?
>
>
>
> from pyspark.sql.functions import regexp_replace, udf
>
>
>
> def my_f(data):
>
>     return regexp_replace(data, 'a', 'X')
>
> my_udf = udf(my_f)
>
>
>
> test_data = sqlContext.createDataFrame([('a',), ('b',), ('c',)], ('name',))
>
> test_data.select(my_udf(test_data.name)).show()
>
>
>
> But instead of 'a' being replaced with 'X' I get exception:
>
>   File
> ".../spark-2.0.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/functions.py",
> line 1471, in regexp_replace
>
>     jc = sc._jvm.functions.regexp_replace(_to_java_column(str), pattern,
> replacement)
>
> AttributeError: 'NoneType' object has no attribute '_jvm'
>
>
>
> ???
>
>
>
> -Perttu
>
>
>

Re: Nested UDFs

Reply via email to