Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

Ted Yu Fri, 17 Apr 2015 07:21:11 -0700

The image didn't go through.

I think you were referring to:
  override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f)


Cheers

On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:

> Hi everyone,
> I had an issue trying to use Spark SQL from Java (8 or 7), I tried to
> reproduce it in a small test case close to the actual documentation
> <https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection>,
> so sorry for the long mail, but this is "Java" :
>
> import org.apache.spark.api.java.JavaRDD;
> import org.apache.spark.api.java.JavaSparkContext;
> import org.apache.spark.sql.DataFrame;
> import org.apache.spark.sql.SQLContext;
>
> import java.io.Serializable;
> import java.util.ArrayList;
> import java.util.Arrays;
> import java.util.List;
>
> class Movie implements Serializable {
>     private int id;
>     private String name;
>
>     public Movie(int id, String name) {
>         this.id = id;
>         this.name = name;
>     }
>
>     public int getId() {
>         return id;
>     }
>
>     public void setId(int id) {
>         this.id = id;
>     }
>
>     public String getName() {
>         return name;
>     }
>
>     public void setName(String name) {
>         this.name = name;
>     }
> }
>
> public class SparkSQLTest {
>     public static void main(String[] args) {
>         SparkConf conf = new SparkConf();
>         conf.setAppName("My Application");
>         conf.setMaster("local");
>         JavaSparkContext sc = new JavaSparkContext(conf);
>
>         ArrayList<Movie> movieArrayList = new ArrayList<Movie>();
>         movieArrayList.add(new Movie(1, "Indiana Jones"));
>
>         JavaRDD<Movie> movies = sc.parallelize(movieArrayList);
>
>         SQLContext sqlContext = new SQLContext(sc);
>         DataFrame frame = sqlContext.applySchema(movies, Movie.class);
>         frame.registerTempTable("movies");
>
>         sqlContext.sql("select name from movies")
>
> *                .map(row -> row.getString(0)) // this is what i would expect 
> to work *                .collect();
>     }
> }
>
>
> But this does not compile, here's the compilation error :
>
> [ERROR]
> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
> method map in class org.apache.spark.sql.DataFrame cannot be applied to
> given types;
> [ERROR] *required:
> scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> *
> [ERROR]* found: (row)->"Na[...]ng(0) *
> [ERROR] *reason: cannot infer type-variable(s) R *
> [ERROR] *(actual and formal argument lists differ in length) *
> [ERROR]
> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
> method map in class org.apache.spark.sql.DataFrame cannot be applied to
> given types;
> [ERROR] required:
> scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R>
> [ERROR] found: (row)->row[...]ng(0)
> [ERROR] reason: cannot infer type-variable(s) R
> [ERROR] (actual and formal argument lists differ in length)
> [ERROR] -> [Help 1]
>
> Because in the DataFrame the *map *method is defined as :
>
> [image: Images intégrées 1]
>
> And once this is translated to bytecode the actual Java signature uses a
> Function1 and adds a ClassTag parameter.
> I can try to go around this and use the scala.reflect.ClassTag$ like that :
>
> ClassTag$.MODULE$.apply(String.class)
>
> To get the second ClassTag parameter right, but then instantiating a 
> java.util.Function or using the Java 8 lambdas fail to work, and if I try to 
> instantiate a proper scala Function1... well this is a world of pain.
>
> This is a regression introduced by the 1.3.x DataFrame because JavaSchemaRDD 
> used to be JavaRDDLike but DataFrame's are not (and are not callable with 
> JFunctions), I can open a Jira if you want ?
>
> Regards,
>
> --
> *Olivier Girardot* | Associé
> o.girar...@lateral-thoughts.com
> +33 6 24 09 17 94
>

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

Reply via email to