subject:"pyspark split pair rdd to multiple"

Re: pyspark split pair rdd to multiple

2016-04-20 Thread Gourav Sengupta

Hi, you do not need to do anything with the RDD at all. Just follow the instructions in this site https://github.com/databricks/spark-csv and everything will be super fast and smooth. Remember that in case the data is large then converting RDD to dataframes takes a very very very very long time.

Re: pyspark split pair rdd to multiple

2016-04-20 Thread Wei Chen

Let's assume K is String, and V is Integer, schema = StructType([StructField("K", StringType(), True), StructField("V", IntegerType(), True)]) df = sqlContext.createDataFrame(rdd, schema=schema) udf1 = udf(lambda x: [x], ArrayType(IntegerType())) df1 = df.select("K", udf1("V").alias("arrayV"))

Re: pyspark split pair rdd to multiple

2016-04-20 Thread patcharee

I can also use dataframe. Any suggestions? Best, Patcharee On 20. april 2016 10:43, Gourav Sengupta wrote: Is there any reason why you are not using data frames? Regards, Gourav On Tue, Apr 19, 2016 at 8:51 PM, pth001 > wrote:

Re: pyspark split pair rdd to multiple

2016-04-20 Thread Gourav Sengupta

Is there any reason why you are not using data frames? Regards, Gourav On Tue, Apr 19, 2016 at 8:51 PM, pth001 wrote: > Hi, > > How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in > Pyspark? > > Best, > Patcharee > >

pyspark split pair rdd to multiple

2016-04-19 Thread pth001

Hi, How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in Pyspark? Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: pyspark split pair rdd to multiple

Re: pyspark split pair rdd to multiple

Re: pyspark split pair rdd to multiple

Re: pyspark split pair rdd to multiple

pyspark split pair rdd to multiple

5 matches

Site Navigation

Mail list logo

Footer information