Re: compile error: No classtag available while calling RDD.zip()

2017-09-13 Thread bluejoe
Thanks for your reply!

Actually, It is Ok when I use RDD.zip() like this:

1 def zipDatasets(m:Dataset[String], n:Dataset[Int])={

2 m.sparkSession.createDataset(m.rdd.zip(n.rdd));

3 }


But in my project, the type of Dataset is designated by the caller, so I 
introduce X,Y:


1 def zipDatasets[X: Encoder, Y: Encoder](m:Dataset[X], n:Dataset[Y])={

2 m.sparkSession.createDataset(m.rdd.zip(n.rdd));

3 }


It reports error because Y is unknown to the compiler, while the compiler needs 
ClassTag information of Y
Now I have no idea to fix it.

Regards,
bluejoe

发件人:  Anastasios Zouzias
答复:  <zouz...@gmail.com>
日期:  2017年9月14日 星期四 上午2:10
至:  bluejoe
抄送:  user
主题:  Re: compile error: No classtag available while calling RDD.zip()

Hi there,

If it is OK with you to work with DataFrames, you can do

https://gist.github.com/zouzias/44723de11222535223fe59b4b0bc228c

import org.apache.spark.sql.Row   
import org.apache.spark.sql.types.{StructField,StructType,IntegerType, 
LongType}  
 
  
 
val  df = sc.parallelize(Seq(  
 
(1.0, 2.0), (0.0, -1.0),  
 
(3.0, 4.0), (6.0, -2.3))).toDF("x", "y")  
 
  
 
// Append "rowid" column of type Long  
 
val schema = df.schema  
 
val newSchema = StructType(df.schema.fields ++ Array(StructField("rowid", 
LongType, false)))  
 
  
 
// Zip on RDD level  
 
val rddWithId = df.rdd.zipWithIndex  
 
// Convert back to DataFrame  
 
val dfZippedWithId =  spark.createDataFrame(rddWithId.map{ case (row, index) => 
Row.fromSeq(row.toSeq ++ Array(index))}, newSchema)  
 
  
 
// Show results  
 dfZippedWithId.show

Best,
Anastasios



On Wed, Sep 13, 2017 at 5:07 PM, 沈志宏 <blue...@cnic.cn> wrote:
Hello,Since Dataset has no zip(..) methods, so I wrote following code to zip 
two datasets: 

1   def zipDatasets[X: Encoder, Y: Encoder](spark: SparkSession, m: 
Dataset[X], n: Dataset[Y]) = {
2   val rdd = m.rdd.zip(n.rdd);
3   import spark.implicits._
4   spark.createDataset(rdd);
5   }

However, in the m.rdd.zip(…) call, compile error is reported:   No ClassTag 
available for Y

I know this error can be corrected when I declare Y as a ClassTag like this:

1   def foo[X: Encoder, Y: ClassTag](spark: SparkSession, …

But this will make line 5 report a new error:
Unable to find encoder for type stored in a Dataset.

Now, I have no idea to solve this problem. How to declared Y as both an Encoder 
and a ClassTag?

Many thanks!

Best regards,
bluejoe
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org




-- 
-- Anastasios Zouzias



Re: compile error: No classtag available while calling RDD.zip()

2017-09-13 Thread Anastasios Zouzias
Hi there,

If it is OK with you to work with DataFrames, you can do

https://gist.github.com/zouzias/44723de11222535223fe59b4b0bc228c

import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructField,StructType,IntegerType,
LongType}
val df = sc.parallelize(Seq(
(1.0, 2.0), (0.0, -1.0),
(3.0, 4.0), (6.0, -2.3))).toDF("x", "y")
// Append "rowid" column of type Long
val schema = df.schema
val newSchema = StructType(df.schema.fields ++ Array(StructField("rowid",
LongType, false)))
// Zip on RDD level
val rddWithId = df.rdd.zipWithIndex
// Convert back to DataFrame
val dfZippedWithId = spark.createDataFrame(rddWithId.map{ case (row, index)
=> Row.fromSeq(row.toSeq ++ Array(index))}, newSchema)
// Show results
dfZippedWithId.show

Best,
Anastasios



On Wed, Sep 13, 2017 at 5:07 PM, 沈志宏  wrote:

> Hello,Since Dataset has no zip(..) methods, so I wrote following code to
> zip two datasets:
>
> 1   def zipDatasets[X: Encoder, Y: Encoder](spark: SparkSession, m:
> Dataset[X], n: Dataset[Y]) = {
> 2   val rdd = m.rdd.zip(n.rdd);
> 3   import spark.implicits._
> 4   spark.createDataset(rdd);
> 5   }
>
> However, in the m.rdd.zip(…) call, compile error is reported:   No
> ClassTag available for Y
>
> I know this error can be corrected when I declare Y as a ClassTag like
> this:
>
> 1   def foo[X: Encoder, Y: ClassTag](spark: SparkSession, …
>
> But this will make line 5 report a new error:
> Unable to find encoder for type stored in a Dataset.
>
> Now, I have no idea to solve this problem. How to declared Y as both an
> Encoder and a ClassTag?
>
> Many thanks!
>
> Best regards,
> bluejoe
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
-- Anastasios Zouzias



compile error: No classtag available while calling RDD.zip()

2017-09-13 Thread 沈志宏
Hello,Since Dataset has no zip(..) methods, so I wrote following code to zip 
two datasets: 

1   def zipDatasets[X: Encoder, Y: Encoder](spark: SparkSession, m: 
Dataset[X], n: Dataset[Y]) = {
2   val rdd = m.rdd.zip(n.rdd);
3   import spark.implicits._
4   spark.createDataset(rdd);
5   }

However, in the m.rdd.zip(…) call, compile error is reported:   No ClassTag 
available for Y

I know this error can be corrected when I declare Y as a ClassTag like this:

1   def foo[X: Encoder, Y: ClassTag](spark: SparkSession, …

But this will make line 5 report a new error:
Unable to find encoder for type stored in a Dataset.

Now, I have no idea to solve this problem. How to declared Y as both an Encoder 
and a ClassTag?

Many thanks!

Best regards,
bluejoe
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org