The image didn't go through. I think you were referring to: override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f)
Cheers On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > I had an issue trying to use Spark SQL from Java (8 or 7), I tried to > reproduce it in a small test case close to the actual documentation > <https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection>, > so sorry for the long mail, but this is "Java" : > > import org.apache.spark.api.java.JavaRDD; > import org.apache.spark.api.java.JavaSparkContext; > import org.apache.spark.sql.DataFrame; > import org.apache.spark.sql.SQLContext; > > import java.io.Serializable; > import java.util.ArrayList; > import java.util.Arrays; > import java.util.List; > > class Movie implements Serializable { > private int id; > private String name; > > public Movie(int id, String name) { > this.id = id; > this.name = name; > } > > public int getId() { > return id; > } > > public void setId(int id) { > this.id = id; > } > > public String getName() { > return name; > } > > public void setName(String name) { > this.name = name; > } > } > > public class SparkSQLTest { > public static void main(String[] args) { > SparkConf conf = new SparkConf(); > conf.setAppName("My Application"); > conf.setMaster("local"); > JavaSparkContext sc = new JavaSparkContext(conf); > > ArrayList<Movie> movieArrayList = new ArrayList<Movie>(); > movieArrayList.add(new Movie(1, "Indiana Jones")); > > JavaRDD<Movie> movies = sc.parallelize(movieArrayList); > > SQLContext sqlContext = new SQLContext(sc); > DataFrame frame = sqlContext.applySchema(movies, Movie.class); > frame.registerTempTable("movies"); > > sqlContext.sql("select name from movies") > > * .map(row -> row.getString(0)) // this is what i would expect > to work * .collect(); > } > } > > > But this does not compile, here's the compilation error : > > [ERROR] > /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47] > method map in class org.apache.spark.sql.DataFrame cannot be applied to > given types; > [ERROR] *required: > scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> * > [ERROR]* found: (row)->"Na[...]ng(0) * > [ERROR] *reason: cannot infer type-variable(s) R * > [ERROR] *(actual and formal argument lists differ in length) * > [ERROR] > /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17] > method map in class org.apache.spark.sql.DataFrame cannot be applied to > given types; > [ERROR] required: > scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> > [ERROR] found: (row)->row[...]ng(0) > [ERROR] reason: cannot infer type-variable(s) R > [ERROR] (actual and formal argument lists differ in length) > [ERROR] -> [Help 1] > > Because in the DataFrame the *map *method is defined as : > > [image: Images intégrées 1] > > And once this is translated to bytecode the actual Java signature uses a > Function1 and adds a ClassTag parameter. > I can try to go around this and use the scala.reflect.ClassTag$ like that : > > ClassTag$.MODULE$.apply(String.class) > > To get the second ClassTag parameter right, but then instantiating a > java.util.Function or using the Java 8 lambdas fail to work, and if I try to > instantiate a proper scala Function1... well this is a world of pain. > > This is a regression introduced by the 1.3.x DataFrame because JavaSchemaRDD > used to be JavaRDDLike but DataFrame's are not (and are not callable with > JFunctions), I can open a Jira if you want ? > > Regards, > > -- > *Olivier Girardot* | Associé > o.girar...@lateral-thoughts.com > +33 6 24 09 17 94 >