I think in 1.3 and above, you'd need to do .sql(...).javaRDD().map(..)
On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Yes thanks ! > > Le ven. 17 avr. 2015 à 16:20, Ted Yu <yuzhih...@gmail.com> a écrit : > > > The image didn't go through. > > > > I think you were referring to: > > override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f) > > > > Cheers > > > > On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot < > > o.girar...@lateral-thoughts.com> wrote: > > > > > Hi everyone, > > > I had an issue trying to use Spark SQL from Java (8 or 7), I tried to > > > reproduce it in a small test case close to the actual documentation > > > < > > > https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection > > >, > > > so sorry for the long mail, but this is "Java" : > > > > > > import org.apache.spark.api.java.JavaRDD; > > > import org.apache.spark.api.java.JavaSparkContext; > > > import org.apache.spark.sql.DataFrame; > > > import org.apache.spark.sql.SQLContext; > > > > > > import java.io.Serializable; > > > import java.util.ArrayList; > > > import java.util.Arrays; > > > import java.util.List; > > > > > > class Movie implements Serializable { > > > private int id; > > > private String name; > > > > > > public Movie(int id, String name) { > > > this.id = id; > > > this.name = name; > > > } > > > > > > public int getId() { > > > return id; > > > } > > > > > > public void setId(int id) { > > > this.id = id; > > > } > > > > > > public String getName() { > > > return name; > > > } > > > > > > public void setName(String name) { > > > this.name = name; > > > } > > > } > > > > > > public class SparkSQLTest { > > > public static void main(String[] args) { > > > SparkConf conf = new SparkConf(); > > > conf.setAppName("My Application"); > > > conf.setMaster("local"); > > > JavaSparkContext sc = new JavaSparkContext(conf); > > > > > > ArrayList<Movie> movieArrayList = new ArrayList<Movie>(); > > > movieArrayList.add(new Movie(1, "Indiana Jones")); > > > > > > JavaRDD<Movie> movies = sc.parallelize(movieArrayList); > > > > > > SQLContext sqlContext = new SQLContext(sc); > > > DataFrame frame = sqlContext.applySchema(movies, Movie.class); > > > frame.registerTempTable("movies"); > > > > > > sqlContext.sql("select name from movies") > > > > > > * .map(row -> row.getString(0)) // this is what i would > > expect to work * .collect(); > > > } > > > } > > > > > > > > > But this does not compile, here's the compilation error : > > > > > > [ERROR] > > > > > > /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47] > > > method map in class org.apache.spark.sql.DataFrame cannot be applied to > > > given types; > > > [ERROR] *required: > > > scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> * > > > [ERROR]* found: (row)->"Na[...]ng(0) * > > > [ERROR] *reason: cannot infer type-variable(s) R * > > > [ERROR] *(actual and formal argument lists differ in length) * > > > [ERROR] > > > > > > /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17] > > > method map in class org.apache.spark.sql.DataFrame cannot be applied to > > > given types; > > > [ERROR] required: > > > scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> > > > [ERROR] found: (row)->row[...]ng(0) > > > [ERROR] reason: cannot infer type-variable(s) R > > > [ERROR] (actual and formal argument lists differ in length) > > > [ERROR] -> [Help 1] > > > > > > Because in the DataFrame the *map *method is defined as : > > > > > > [image: Images intégrées 1] > > > > > > And once this is translated to bytecode the actual Java signature uses > a > > > Function1 and adds a ClassTag parameter. > > > I can try to go around this and use the scala.reflect.ClassTag$ like > > that : > > > > > > ClassTag$.MODULE$.apply(String.class) > > > > > > To get the second ClassTag parameter right, but then instantiating a > > java.util.Function or using the Java 8 lambdas fail to work, and if I try > > to instantiate a proper scala Function1... well this is a world of pain. > > > > > > This is a regression introduced by the 1.3.x DataFrame because > > JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are not > > callable with JFunctions), I can open a Jira if you want ? > > > > > > Regards, > > > > > > -- > > > *Olivier Girardot* | Associé > > > o.girar...@lateral-thoughts.com > > > +33 6 24 09 17 94 > > > > > >