Hi, My example was little bogus, my real use case is to do multiple regexp replacements so something like:
def my_f(data): for match, repl in regexp_list: data = regexp_replace(match, repl, data) return data I could achieve my goal by mutiple .select(regexp_replace()) lines, but one UDF would be nicer. -Perttu to 17. marraskuuta 2016 klo 9.42 Mendelson, Assaf <assaf.mendel...@rsa.com> kirjoitti: > Regexp_replace is supposed to receive a column, you don’t need to write a > UDF for it. > > Instead try: > > Test_data.select(regexp_Replace(test_data.name, ‘a’, ‘X’) > > > > You would need a Udf if you would wanted to do something on the string > value of a single row (e.g. return data + “bla”) > > > > Assaf. > > > > *From:* Perttu Ranta-aho [mailto:ranta...@iki.fi] > *Sent:* Thursday, November 17, 2016 9:15 AM > *To:* user@spark.apache.org > *Subject:* Nested UDFs > > > > Hi, > > > > Shouldn't this work? > > > > from pyspark.sql.functions import regexp_replace, udf > > > > def my_f(data): > > return regexp_replace(data, 'a', 'X') > > my_udf = udf(my_f) > > > > test_data = sqlContext.createDataFrame([('a',), ('b',), ('c',)], ('name',)) > > test_data.select(my_udf(test_data.name)).show() > > > > But instead of 'a' being replaced with 'X' I get exception: > > File > ".../spark-2.0.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/functions.py", > line 1471, in regexp_replace > > jc = sc._jvm.functions.regexp_replace(_to_java_column(str), pattern, > replacement) > > AttributeError: 'NoneType' object has no attribute '_jvm' > > > > ??? > > > > -Perttu > > >