And you can use jsonRDD(json:RDD[String], schema:StructType) to clearly clarify 
your schema. For numbers later than Long, we can use DecimalType.

Thanks,
Daoyuan

From: Wang, Daoyuan [mailto:daoyuan.w...@intel.com]
Sent: Friday, January 16, 2015 5:14 PM
To: Tobias Pfeiffer
Cc: user
Subject: RE: MatchError in JsonRDD.toLong

The second parameter of jsonRDD is the sampling ratio when we infer schema.

Thanks,
Daoyuan

From: Tobias Pfeiffer [mailto:t...@preferred.jp]
Sent: Friday, January 16, 2015 5:11 PM
To: Wang, Daoyuan
Cc: user
Subject: Re: MatchError in JsonRDD.toLong

Hi,

On Fri, Jan 16, 2015 at 5:55 PM, Wang, Daoyuan 
<daoyuan.w...@intel.com<mailto:daoyuan.w...@intel.com>> wrote:
Can you provide how you create the JsonRDD?

This should be reproducible in the Spark shell:

---------------------------------------------------------
import org.apache.spark.sql._
val sqlc = new SparkContext(sc)
val rdd = sc.parallelize("""{"Click":"nonclicked", "Impression":1, 
"DisplayURL":4401798909506983219, "AdId":21215341}""" ::
                         """{"Click":"nonclicked", "Impression":1, 
"DisplayURL":14452800566866169008, "AdId":10587781}""" :: Nil)

// works fine
val json = sqlc.jsonRDD(rdd)
json.registerTempTable("test")
sqlc.sql("SELECT * FROM test").collect

// -> MatchError
val json2 = sqlc.jsonRDD(rdd, 0.1)
json2.registerTempTable("test2")
sqlc.sql("SELECT * FROM test2").collect
---------------------------------------------------------

I guess the issue in the latter case is that the column is inferred as Long 
when some rows actually are too big for Long...

Thanks
Tobias

Reply via email to