Re: how to create List in pyspark
Why no use sql functions explode and split? Would perform and be more stable then udf From: Yanbo Liang Sent: Thursday, April 27, 2017 7:34:54 AM To: Selvam Raman Cc: user Subject: Re: how to create List in pyspark You can try with UDF, like the following code snippet: from pyspark.sql.functions import udf from pyspark.sql.types import ArrayType, StringType df = spark.read.text("./README.md") split_func = udf(lambda text: text.split(" "), ArrayType(StringType())) df.withColumn("split_value", split_func("value")).show() Thanks Yanbo On Tue, Apr 25, 2017 at 12:27 AM, Selvam Raman mailto:sel...@gmail.com>> wrote: documentDF = spark.createDataFrame([ ("Hi I heard about Spark".split(" "), ), ("I wish Java could use case classes".split(" "), ), ("Logistic regression models are neat".split(" "), ) ], ["text"]) How can i achieve the same df while i am reading from source? doc = spark.read.text("/Users/rs/Desktop/nohup.out") how can i create array type with "sentences" column from doc(dataframe) The below one creates more than one column. rdd.map(lambda rdd: rdd[0]).map(lambda row:row.split(" ")) -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Re: how to create List in pyspark
You can try with UDF, like the following code snippet: from pyspark.sql.functions import udf from pyspark.sql.types import ArrayType, StringType df = spark.read.text("./README.md") split_func = udf(lambda text: text.split(" "), ArrayType(StringType())) df.withColumn("split_value", split_func("value")).show() Thanks Yanbo On Tue, Apr 25, 2017 at 12:27 AM, Selvam Raman wrote: > documentDF = spark.createDataFrame([ > > ("Hi I heard about Spark".split(" "), ), > > ("I wish Java could use case classes".split(" "), ), > > ("Logistic regression models are neat".split(" "), ) > > ], ["text"]) > > > How can i achieve the same df while i am reading from source? > > doc = spark.read.text("/Users/rs/Desktop/nohup.out") > > how can i create array type with "sentences" column from > doc(dataframe) > > > The below one creates more than one column. > > rdd.map(lambda rdd: rdd[0]).map(lambda row:row.split(" ")) > > -- > Selvam Raman > "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து" >
how to create List in pyspark
documentDF = spark.createDataFrame([ ("Hi I heard about Spark".split(" "), ), ("I wish Java could use case classes".split(" "), ), ("Logistic regression models are neat".split(" "), ) ], ["text"]) How can i achieve the same df while i am reading from source? doc = spark.read.text("/Users/rs/Desktop/nohup.out") how can i create array type with "sentences" column from doc(dataframe) The below one creates more than one column. rdd.map(lambda rdd: rdd[0]).map(lambda row:row.split(" ")) -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"