With commit 200f01c8fb15680b5630fbd122d44f9b1d096e02 using Scala 2.11: Using Python version 2.7.9 (default, Apr 29 2016 10:48:06) SparkSession available as 'spark'. >>> from pyspark.sql import SparkSession >>> from pyspark.sql.types import IntegerType, StructField, StructType >>> from pyspark.sql.functions import udf >>> from pyspark.sql.types import Row >>> spark = SparkSession.builder.master('local[4]').appName('2.0 DF').getOrCreate() >>> add_one = udf(lambda x: x + 1, IntegerType()) >>> schema = StructType([StructField('a', IntegerType(), False)]) >>> df = spark.createDataFrame([Row(a=1),Row(a=2)], schema) >>> df.select(add_one(df.a).alias('incremented')).collect() [Row(incremented=2), Row(incremented=3)]
Let me build with Scala 2.10 and try again. On Tue, Jun 7, 2016 at 2:47 PM, Franklyn D'souza < franklyn.dso...@shopify.com> wrote: > I've built spark-2.0-preview (8f5a04b) with scala-2.10 using the following >> >> >> ./dev/change-version-to-2.10.sh >> ./dev/make-distribution.sh -DskipTests -Dzookeeper.version=3.4.5 >> -Dcurator.version=2.4.0 -Dscala-2.10 -Phadoop-2.6 -Pyarn -Phive > > > and then ran the following code in a pyspark shell > > from pyspark.sql import SparkSession >> from pyspark.sql.types import IntegerType, StructField, StructType >> from pyspark.sql.functions import udf >> from pyspark.sql.types import Row >> spark = SparkSession.builder.master('local[4]').appName('2.0 >> DF').getOrCreate() >> add_one = udf(lambda x: x + 1, IntegerType()) >> schema = StructType([StructField('a', IntegerType(), False)]) >> df = spark.createDataFrame([Row(a=1),Row(a=2)], schema) >> df.select(add_one(df.a).alias('incremented')).collect() > > > This never returns with a result. > > >