Re: scala.MatchError while doing BinaryClassificationMetrics

2016-11-15 Thread Bhaarat Sharma
Thank, Nick.

This worked for me.

val evaluator = new BinaryClassificationEvaluator().
setLabelCol("label").
setRawPredictionCol("ModelProbability").
setMetricName("areaUnderROC")

val auROC = evaluator.evaluate(testResults)

On Mon, Nov 14, 2016 at 4:00 PM, Nick Pentreath 
wrote:

> Typically you pass in the result of a model transform to the evaluator.
>
> So:
> val model = estimator.fit(data)
> val auc = evaluator.evaluate(model.transform(testData)
>
> Check Scala API docs for some details: http://spark.apache.
> org/docs/latest/api/scala/index.html#org.apache.spark.ml.evaluation.
> BinaryClassificationEvaluator
>
> On Mon, 14 Nov 2016 at 20:02 Bhaarat Sharma  wrote:
>
> Can you please suggest how I can use BinaryClassificationEvaluator? I
> tried:
>
> scala> import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
> import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
>
> scala>  val evaluator = new BinaryClassificationEvaluator()
> evaluator: org.apache.spark.ml.evaluation.BinaryClassificationEvaluator =
> binEval_0d57372b7579
>
> Try 1:
>
> scala> evaluator.evaluate(testScoreAndLabel.rdd)
> :105: error: type mismatch;
>  found   : org.apache.spark.rdd.RDD[(Double, Double)]
>  required: org.apache.spark.sql.Dataset[_]
>evaluator.evaluate(testScoreAndLabel.rdd)
>
> Try 2:
>
> scala> evaluator.evaluate(testScoreAndLabel)
> java.lang.IllegalArgumentException: Field "rawPrediction" does not exist.
>   at org.apache.spark.sql.types.StructType$$anonfun$apply$1.
> apply(StructType.scala:228)
>
> Try 3:
>
> scala> evaluator.evaluate(testScoreAndLabel.select("
> Label","ModelProbability"))
> org.apache.spark.sql.AnalysisException: cannot resolve '`Label`' given
> input columns: [_1, _2];
>   at org.apache.spark.sql.catalyst.analysis.package$
> AnalysisErrorAt.failAnalysis(package.scala:42)
>
>
> On Mon, Nov 14, 2016 at 1:44 PM, Nick Pentreath 
> wrote:
>
> DataFrame.rdd returns an RDD[Row]. You'll need to use map to extract the
> doubles from the test score and label DF.
>
> But you may prefer to just use spark.ml evaluators, which work with
> DataFrames. Try BinaryClassificationEvaluator.
>
> On Mon, 14 Nov 2016 at 19:30, Bhaarat Sharma  wrote:
>
> I am getting scala.MatchError in the code below. I'm not able to see why
> this would be happening. I am using Spark 2.0.1
>
> scala> testResults.columns
> res538: Array[String] = Array(TopicVector, subject_id, hadm_id, isElective, 
> isNewborn, isUrgent, isEmergency, isMale, isFemale, oasis_score, 
> sapsii_score, sofa_score, age, hosp_death, test, ModelFeatures, Label, 
> rawPrediction, ModelProbability, ModelPrediction)
>
> scala> testResults.select("Label","ModelProbability").take(1)
> res542: Array[org.apache.spark.sql.Row] = 
> Array([0.0,[0.737304818744076,0.262695181255924]])
>
> scala> val testScoreAndLabel = testResults.
>  | select("Label","ModelProbability").
>  | map { case Row(l:Double, p:Vector) => (p(1), l) }
> testScoreAndLabel: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: 
> double, _2: double]
>
> scala> testScoreAndLabel
> res539: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: double, _2: 
> double]
>
> scala> testScoreAndLabel.columns
> res540: Array[String] = Array(_1, _2)
>
> scala> val testMetrics = new 
> BinaryClassificationMetrics(testScoreAndLabel.rdd)
> testMetrics: org.apache.spark.mllib.evaluation.BinaryClassificationMetrics = 
> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics@36e780d1
>
> The code below gives the error
>
> val auROC = testMetrics.areaUnderROC() //this line gives the error
>
> Caused by: scala.MatchError: [0.0,[0.7316583497453766,0.2683416502546234]] 
> (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
>
>
>


Re: scala.MatchError while doing BinaryClassificationMetrics

2016-11-14 Thread Nick Pentreath
Typically you pass in the result of a model transform to the evaluator.

So:
val model = estimator.fit(data)
val auc = evaluator.evaluate(model.transform(testData)

Check Scala API docs for some details:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.evaluation.BinaryClassificationEvaluator

On Mon, 14 Nov 2016 at 20:02 Bhaarat Sharma  wrote:

Can you please suggest how I can use BinaryClassificationEvaluator? I tried:

scala> import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator

scala>  val evaluator = new BinaryClassificationEvaluator()
evaluator: org.apache.spark.ml.evaluation.BinaryClassificationEvaluator =
binEval_0d57372b7579

Try 1:

scala> evaluator.evaluate(testScoreAndLabel.rdd)
:105: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[(Double, Double)]
 required: org.apache.spark.sql.Dataset[_]
   evaluator.evaluate(testScoreAndLabel.rdd)

Try 2:

scala> evaluator.evaluate(testScoreAndLabel)
java.lang.IllegalArgumentException: Field "rawPrediction" does not exist.
  at
org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:228)

Try 3:

scala>
evaluator.evaluate(testScoreAndLabel.select("Label","ModelProbability"))
org.apache.spark.sql.AnalysisException: cannot resolve '`Label`' given
input columns: [_1, _2];
  at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)


On Mon, Nov 14, 2016 at 1:44 PM, Nick Pentreath 
wrote:

DataFrame.rdd returns an RDD[Row]. You'll need to use map to extract the
doubles from the test score and label DF.

But you may prefer to just use spark.ml evaluators, which work with
DataFrames. Try BinaryClassificationEvaluator.

On Mon, 14 Nov 2016 at 19:30, Bhaarat Sharma  wrote:

I am getting scala.MatchError in the code below. I'm not able to see why
this would be happening. I am using Spark 2.0.1

scala> testResults.columns
res538: Array[String] = Array(TopicVector, subject_id, hadm_id,
isElective, isNewborn, isUrgent, isEmergency, isMale, isFemale,
oasis_score, sapsii_score, sofa_score, age, hosp_death, test,
ModelFeatures, Label, rawPrediction, ModelProbability,
ModelPrediction)

scala> testResults.select("Label","ModelProbability").take(1)
res542: Array[org.apache.spark.sql.Row] =
Array([0.0,[0.737304818744076,0.262695181255924]])

scala> val testScoreAndLabel = testResults.
 | select("Label","ModelProbability").
 | map { case Row(l:Double, p:Vector) => (p(1), l) }
testScoreAndLabel: org.apache.spark.sql.Dataset[(Double, Double)] =
[_1: double, _2: double]

scala> testScoreAndLabel
res539: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: double,
_2: double]

scala> testScoreAndLabel.columns
res540: Array[String] = Array(_1, _2)

scala> val testMetrics = new BinaryClassificationMetrics(testScoreAndLabel.rdd)
testMetrics: org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
= org.apache.spark.mllib.evaluation.BinaryClassificationMetrics@36e780d1

The code below gives the error

val auROC = testMetrics.areaUnderROC() //this line gives the error

Caused by: scala.MatchError:
[0.0,[0.7316583497453766,0.2683416502546234]] (of class
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)


Re: scala.MatchError while doing BinaryClassificationMetrics

2016-11-14 Thread Bhaarat Sharma
Can you please suggest how I can use BinaryClassificationEvaluator? I tried:

scala> import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator

scala>  val evaluator = new BinaryClassificationEvaluator()
evaluator: org.apache.spark.ml.evaluation.BinaryClassificationEvaluator =
binEval_0d57372b7579

Try 1:

scala> evaluator.evaluate(testScoreAndLabel.rdd)
:105: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[(Double, Double)]
 required: org.apache.spark.sql.Dataset[_]
   evaluator.evaluate(testScoreAndLabel.rdd)

Try 2:

scala> evaluator.evaluate(testScoreAndLabel)
java.lang.IllegalArgumentException: Field "rawPrediction" does not exist.
  at
org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:228)

Try 3:

scala>
evaluator.evaluate(testScoreAndLabel.select("Label","ModelProbability"))
org.apache.spark.sql.AnalysisException: cannot resolve '`Label`' given
input columns: [_1, _2];
  at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)


On Mon, Nov 14, 2016 at 1:44 PM, Nick Pentreath 
wrote:

> DataFrame.rdd returns an RDD[Row]. You'll need to use map to extract the
> doubles from the test score and label DF.
>
> But you may prefer to just use spark.ml evaluators, which work with
> DataFrames. Try BinaryClassificationEvaluator.
>
> On Mon, 14 Nov 2016 at 19:30, Bhaarat Sharma  wrote:
>
>> I am getting scala.MatchError in the code below. I'm not able to see why
>> this would be happening. I am using Spark 2.0.1
>>
>> scala> testResults.columns
>> res538: Array[String] = Array(TopicVector, subject_id, hadm_id, isElective, 
>> isNewborn, isUrgent, isEmergency, isMale, isFemale, oasis_score, 
>> sapsii_score, sofa_score, age, hosp_death, test, ModelFeatures, Label, 
>> rawPrediction, ModelProbability, ModelPrediction)
>>
>> scala> testResults.select("Label","ModelProbability").take(1)
>> res542: Array[org.apache.spark.sql.Row] = 
>> Array([0.0,[0.737304818744076,0.262695181255924]])
>>
>> scala> val testScoreAndLabel = testResults.
>>  | select("Label","ModelProbability").
>>  | map { case Row(l:Double, p:Vector) => (p(1), l) }
>> testScoreAndLabel: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: 
>> double, _2: double]
>>
>> scala> testScoreAndLabel
>> res539: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: double, _2: 
>> double]
>>
>> scala> testScoreAndLabel.columns
>> res540: Array[String] = Array(_1, _2)
>>
>> scala> val testMetrics = new 
>> BinaryClassificationMetrics(testScoreAndLabel.rdd)
>> testMetrics: org.apache.spark.mllib.evaluation.BinaryClassificationMetrics = 
>> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics@36e780d1
>>
>> The code below gives the error
>>
>> val auROC = testMetrics.areaUnderROC() //this line gives the error
>>
>> Caused by: scala.MatchError: [0.0,[0.7316583497453766,0.2683416502546234]] 
>> (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
>>
>>


Re: scala.MatchError while doing BinaryClassificationMetrics

2016-11-14 Thread Nick Pentreath
DataFrame.rdd returns an RDD[Row]. You'll need to use map to extract the
doubles from the test score and label DF.

But you may prefer to just use spark.ml evaluators, which work with
DataFrames. Try BinaryClassificationEvaluator.

On Mon, 14 Nov 2016 at 19:30, Bhaarat Sharma  wrote:

> I am getting scala.MatchError in the code below. I'm not able to see why
> this would be happening. I am using Spark 2.0.1
>
> scala> testResults.columns
> res538: Array[String] = Array(TopicVector, subject_id, hadm_id, isElective, 
> isNewborn, isUrgent, isEmergency, isMale, isFemale, oasis_score, 
> sapsii_score, sofa_score, age, hosp_death, test, ModelFeatures, Label, 
> rawPrediction, ModelProbability, ModelPrediction)
>
> scala> testResults.select("Label","ModelProbability").take(1)
> res542: Array[org.apache.spark.sql.Row] = 
> Array([0.0,[0.737304818744076,0.262695181255924]])
>
> scala> val testScoreAndLabel = testResults.
>  | select("Label","ModelProbability").
>  | map { case Row(l:Double, p:Vector) => (p(1), l) }
> testScoreAndLabel: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: 
> double, _2: double]
>
> scala> testScoreAndLabel
> res539: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: double, _2: 
> double]
>
> scala> testScoreAndLabel.columns
> res540: Array[String] = Array(_1, _2)
>
> scala> val testMetrics = new 
> BinaryClassificationMetrics(testScoreAndLabel.rdd)
> testMetrics: org.apache.spark.mllib.evaluation.BinaryClassificationMetrics = 
> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics@36e780d1
>
> The code below gives the error
>
> val auROC = testMetrics.areaUnderROC() //this line gives the error
>
> Caused by: scala.MatchError: [0.0,[0.7316583497453766,0.2683416502546234]] 
> (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
>
>


Re: scala.MatchError on stand-alone cluster mode

2016-07-17 Thread Mekal Zheng
Hi, Rishabh Bhardwaj, Saisai Shao,

Thx for your help. I hava found that the key reason is I forgot to upload
the jar package to all of the node in cluster, so after the master
distributed the job and selected one node as the driver,  the driver can
not find the jar package and throw an exception.

-- 
Mekal Zheng
Sent with Airmail

发件人: Rishabh Bhardwaj <rbnex...@gmail.com> <rbnex...@gmail.com>
回复: Rishabh Bhardwaj <rbnex...@gmail.com> <rbnex...@gmail.com>
日期: July 15, 2016 at 17:28:43
至: Saisai Shao <sai.sai.s...@gmail.com> <sai.sai.s...@gmail.com>
抄送: Mekal Zheng <mekal.zh...@gmail.com> <mekal.zh...@gmail.com>, spark users
<user@spark.apache.org> <user@spark.apache.org>
主题:  Re: scala.MatchError on stand-alone cluster mode

Hi Mekal,
It may be a scala version mismatch error,kindly check whether you are
running both (your streaming app and spark cluster ) on 2.10 scala or 2.11.

Thanks,
Rishabh.

On Fri, Jul 15, 2016 at 1:38 PM, Saisai Shao <sai.sai.s...@gmail.com> wrote:

> The error stack is throwing from your code:
>
> Caused by: scala.MatchError: [Ljava.lang.String;@68d279ec (of class
> [Ljava.lang.String;)
> at com.jd.deeplog.LogAggregator$.main(LogAggregator.scala:29)
> at com.jd.deeplog.LogAggregator.main(LogAggregator.scala)
>
> I think you should debug the code yourself, it may not be the problem of
> Spark.
>
> On Fri, Jul 15, 2016 at 3:17 PM, Mekal Zheng <mekal.zh...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a Spark Streaming job written in Scala and is running well on
>> local and client mode, but when I submit it on cluster mode, the driver
>> reported an error shown as below.
>> Is there anyone know what is wrong here?
>> pls help me!
>>
>> the Job CODE is after
>>
>> 16/07/14 17:28:21 DEBUG ByteBufUtil:
>> -Dio.netty.threadLocalDirectBufferSize: 65536
>> 16/07/14 17:28:21 DEBUG NetUtil: Loopback interface: lo (lo,
>> 0:0:0:0:0:0:0:1%lo)
>> 16/07/14 17:28:21 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 32768
>> 16/07/14 17:28:21 DEBUG TransportServer: Shuffle server started on port
>> :43492
>> 16/07/14 17:28:21 INFO Utils: Successfully started service 'Driver' on
>> port 43492.
>> 16/07/14 17:28:21 INFO WorkerWatcher: Connecting to worker spark://
>> Worker@172.20.130.98:23933
>> 16/07/14 17:28:21 DEBUG TransportClientFactory: Creating new connection
>> to /172.20.130.98:23933
>> Exception in thread "main" java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:497)
>> at
>> org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
>> at
>> org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
>> Caused by: scala.MatchError: [Ljava.lang.String;@68d279ec (of class
>> [Ljava.lang.String;)
>> at com.jd.deeplog.LogAggregator$.main(LogAggregator.scala:29)
>> at com.jd.deeplog.LogAggregator.main(LogAggregator.scala)
>> ... 6 more
>>
>> ==
>> Job CODE:
>>
>> object LogAggregator {
>>
>>   val batchDuration = Seconds(5)
>>
>>   def main(args:Array[String]) {
>>
>> val usage =
>>   """Usage: LogAggregator 
>> 
>> |  logFormat: fieldName:fieldRole[,fieldName:fieldRole] each field 
>> must have both name and role
>> |  logFormat.role: can be key|avg|enum|sum|ignore
>>   """.stripMargin
>>
>> if (args.length < 9) {
>>   System.err.println(usage)
>>   System.exit(1)
>> }
>>
>> val Array(zkQuorum, group, topics, numThreads, logFormat, logSeparator, 
>> batchDuration, destType, destPath) = args
>>
>> println("Start streaming calculation...")
>>
>> val conf = new SparkConf().setAppName("LBHaproxy-LogAggregator")
>> val ssc = new StreamingContext(conf, Seconds(batchDuration.toInt))
>>
>> val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
>>
>> val lines = KafkaUtils.createStream(ssc, zkQuorum, group, 
>> topicMap).map(_._2)
>>
>> val logFields = logFormat.split(",").map(field => {
>>   val fld = field.split(":")
>>   if (fld.s

Re: scala.MatchError on stand-alone cluster mode

2016-07-15 Thread Saisai Shao
The error stack is throwing from your code:

Caused by: scala.MatchError: [Ljava.lang.String;@68d279ec (of class
[Ljava.lang.String;)
at com.jd.deeplog.LogAggregator$.main(LogAggregator.scala:29)
at com.jd.deeplog.LogAggregator.main(LogAggregator.scala)

I think you should debug the code yourself, it may not be the problem of
Spark.

On Fri, Jul 15, 2016 at 3:17 PM, Mekal Zheng  wrote:

> Hi,
>
> I have a Spark Streaming job written in Scala and is running well on local
> and client mode, but when I submit it on cluster mode, the driver reported
> an error shown as below.
> Is there anyone know what is wrong here?
> pls help me!
>
> the Job CODE is after
>
> 16/07/14 17:28:21 DEBUG ByteBufUtil:
> -Dio.netty.threadLocalDirectBufferSize: 65536
> 16/07/14 17:28:21 DEBUG NetUtil: Loopback interface: lo (lo,
> 0:0:0:0:0:0:0:1%lo)
> 16/07/14 17:28:21 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 32768
> 16/07/14 17:28:21 DEBUG TransportServer: Shuffle server started on port
> :43492
> 16/07/14 17:28:21 INFO Utils: Successfully started service 'Driver' on
> port 43492.
> 16/07/14 17:28:21 INFO WorkerWatcher: Connecting to worker spark://
> Worker@172.20.130.98:23933
> 16/07/14 17:28:21 DEBUG TransportClientFactory: Creating new connection to
> /172.20.130.98:23933
> Exception in thread "main" java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
> at
> org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
> Caused by: scala.MatchError: [Ljava.lang.String;@68d279ec (of class
> [Ljava.lang.String;)
> at com.jd.deeplog.LogAggregator$.main(LogAggregator.scala:29)
> at com.jd.deeplog.LogAggregator.main(LogAggregator.scala)
> ... 6 more
>
> ==
> Job CODE:
>
> object LogAggregator {
>
>   val batchDuration = Seconds(5)
>
>   def main(args:Array[String]) {
>
> val usage =
>   """Usage: LogAggregator 
> 
> |  logFormat: fieldName:fieldRole[,fieldName:fieldRole] each field 
> must have both name and role
> |  logFormat.role: can be key|avg|enum|sum|ignore
>   """.stripMargin
>
> if (args.length < 9) {
>   System.err.println(usage)
>   System.exit(1)
> }
>
> val Array(zkQuorum, group, topics, numThreads, logFormat, logSeparator, 
> batchDuration, destType, destPath) = args
>
> println("Start streaming calculation...")
>
> val conf = new SparkConf().setAppName("LBHaproxy-LogAggregator")
> val ssc = new StreamingContext(conf, Seconds(batchDuration.toInt))
>
> val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
>
> val lines = KafkaUtils.createStream(ssc, zkQuorum, group, 
> topicMap).map(_._2)
>
> val logFields = logFormat.split(",").map(field => {
>   val fld = field.split(":")
>   if (fld.size != 2) {
> System.err.println("Wrong parameters for logFormat!\n")
> System.err.println(usage)
> System.exit(1)
>   }
>   // TODO: ensure the field has both 'name' and 'role'
>   new LogField(fld(0), fld(1))
> })
>
> val keyFields = logFields.filter(logFieldName => {
>   logFieldName.role == "key"
> })
> val keys = keyFields.map(key => {
>   key.name
> })
>
> val logsByKey = lines.map(line => {
>   val log = new Log(logFields, line, logSeparator)
>   log.toMap
> }).filter(log => log.nonEmpty).map(log => {
>   val keys = keyFields.map(logField => {
> log(logField.name).value
>   })
>
>   val key = keys.reduce((key1, key2) => {
> key1.asInstanceOf[String] + key2.asInstanceOf[String]
>   })
>
>   val fullLog = log + ("count" -> new LogSegment("sum", 1))
>
>   (key, fullLog)
> })
>
>
> val aggResults = logsByKey.reduceByKey((log_a, log_b) => {
>
>   log_a.map(logField => {
> val logFieldName = logField._1
> val logSegment_a = logField._2
> val logSegment_b = log_b(logFieldName)
>
> val segValue = logSegment_a.role match {
>   case "avg" => {
> logSegment_a.value.toString.toInt + 
> logSegment_b.value.toString.toInt
>   }
>   case "sum" => {
> logSegment_a.value.toString.toInt + 
> logSegment_b.value.toString.toInt
>   }
>   case "enum" => {
> val list_a = logSegment_a.value.asInstanceOf[List[(String, Int)]]
> val list_b = logSegment_b.value.asInstanceOf[List[(String, Int)]]
> list_a ++ list_b
>   }
>   case _ => logSegment_a.value
> }
> 

Re: scala.MatchError: class org.apache.avro.Schema (of class java.lang.Class)

2015-04-07 Thread Yamini Maddirala
For more details on my question
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-generate-Java-bean-class-for-avro-files-using-spark-avro-project-tp22413.html

Thanks,
Yamini

On Tue, Apr 7, 2015 at 2:23 PM, Yamini Maddirala yamini.m...@gmail.com
wrote:

 Hi Michael,

 Yes, I did try spark-avro 0.2.0 databricks project. I am using CHD5.3
 which is based on spark 1.2. Hence I'm bound to use spark-avro 0.2.0
 instead of the latest.

 I'm not sure how spark-avro project can help me in this scenario.

 1. I have JavaDStream of type avro generic record
 :JavaDStreamGenericRecord [This is the data being read from kafka topics]
 2. I'm able to get JavaSchemaRDD using the avro file like this
 final JavaSchemaRDD schemaRDD2 = AvroUtils.avroFile(sqlContext,
 /xyz-Project/trunk/src/main/resources/xyz.avro);
 3. I don't know how I can apply schema in step 2 to data in step 1.
 I chose to do something like this
JavaSchemaRDD schemaRDD = sqlContext.applySchema(genericRecordJavaRDD,
 xyz.class);

Used avro maven plugin to generate xyz class in Java. But this is not
 good because avro maven plugin creates a field SCHEMA which is not
 supported in applySchema method.

 Please let me know how to deal with this.

 Appreciate your help

 Thanks,
 Yamini












 On Tue, Apr 7, 2015 at 1:57 PM, Michael Armbrust mich...@databricks.com
 wrote:

 Have you looked at spark-avro?

 https://github.com/databricks/spark-avro

 On Tue, Apr 7, 2015 at 3:57 AM, Yamini yamini.m...@gmail.com wrote:

 Using spark(1.2) streaming to read avro schema based topics flowing in
 kafka
 and then using spark sql context to register data as temp table. Avro
 maven
 plugin(1.7.7 version) generates the java bean class for the avro file but
 includes a field named SCHEMA$ of type org.apache.avro.Schema which is
 not
 supported in the JavaSQLContext class[Method : applySchema].
 How to auto generate java bean class for the avro file and over come the
 above mentioned problem.

 Thanks.




 -
 Thanks,
 Yamini
 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/scala-MatchError-class-org-apache-avro-Schema-of-class-java-lang-Class-tp22402.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






Re: scala.MatchError: class org.apache.avro.Schema (of class java.lang.Class)

2015-04-07 Thread Michael Armbrust
Have you looked at spark-avro?

https://github.com/databricks/spark-avro

On Tue, Apr 7, 2015 at 3:57 AM, Yamini yamini.m...@gmail.com wrote:

 Using spark(1.2) streaming to read avro schema based topics flowing in
 kafka
 and then using spark sql context to register data as temp table. Avro maven
 plugin(1.7.7 version) generates the java bean class for the avro file but
 includes a field named SCHEMA$ of type org.apache.avro.Schema which is not
 supported in the JavaSQLContext class[Method : applySchema].
 How to auto generate java bean class for the avro file and over come the
 above mentioned problem.

 Thanks.




 -
 Thanks,
 Yamini
 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/scala-MatchError-class-org-apache-avro-Schema-of-class-java-lang-Class-tp22402.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: scala.MatchError: class org.apache.avro.Schema (of class java.lang.Class)

2015-04-07 Thread Yamini Maddirala
Hi Michael,

Yes, I did try spark-avro 0.2.0 databricks project. I am using CHD5.3 which
is based on spark 1.2. Hence I'm bound to use spark-avro 0.2.0 instead of
the latest.

I'm not sure how spark-avro project can help me in this scenario.

1. I have JavaDStream of type avro generic record
:JavaDStreamGenericRecord [This is the data being read from kafka topics]
2. I'm able to get JavaSchemaRDD using the avro file like this
final JavaSchemaRDD schemaRDD2 = AvroUtils.avroFile(sqlContext,
/xyz-Project/trunk/src/main/resources/xyz.avro);
3. I don't know how I can apply schema in step 2 to data in step 1.
I chose to do something like this
   JavaSchemaRDD schemaRDD = sqlContext.applySchema(genericRecordJavaRDD,
xyz.class);

   Used avro maven plugin to generate xyz class in Java. But this is not
good because avro maven plugin creates a field SCHEMA which is not
supported in applySchema method.

Please let me know how to deal with this.

Appreciate your help

Thanks,
Yamini












On Tue, Apr 7, 2015 at 1:57 PM, Michael Armbrust mich...@databricks.com
wrote:

 Have you looked at spark-avro?

 https://github.com/databricks/spark-avro

 On Tue, Apr 7, 2015 at 3:57 AM, Yamini yamini.m...@gmail.com wrote:

 Using spark(1.2) streaming to read avro schema based topics flowing in
 kafka
 and then using spark sql context to register data as temp table. Avro
 maven
 plugin(1.7.7 version) generates the java bean class for the avro file but
 includes a field named SCHEMA$ of type org.apache.avro.Schema which is not
 supported in the JavaSQLContext class[Method : applySchema].
 How to auto generate java bean class for the avro file and over come the
 above mentioned problem.

 Thanks.




 -
 Thanks,
 Yamini
 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/scala-MatchError-class-org-apache-avro-Schema-of-class-java-lang-Class-tp22402.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: scala.MatchError on SparkSQL when creating ArrayType of StructType

2014-12-05 Thread Michael Armbrust
All values in Hive are always nullable, though you should still not be
seeing this error.

It should be addressed by this patch:
https://github.com/apache/spark/pull/3150

On Fri, Dec 5, 2014 at 2:36 AM, Hao Ren inv...@gmail.com wrote:

 Hi,

 I am using SparkSQL on 1.1.0 branch.

 The following code leads to a scala.MatchError
 at

 org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:247)

 val scm = StructType(inputRDD.schema.fields.init :+
   StructField(list,
 ArrayType(
   StructType(
 Seq(StructField(date, StringType, nullable = false),
   StructField(nbPurchase, IntegerType, nullable = false,
 nullable = false))

 // purchaseRDD is RDD[sql.ROW] whose schema is corresponding to scm. It is
 transformed from inputRDD
 val schemaRDD = hiveContext.applySchema(purchaseRDD, scm)
 schemaRDD.registerTempTable(t_purchase)

 Here's the stackTrace:
 scala.MatchError: ArrayType(StructType(List(StructField(date,StringType,
 true ), StructField(n_reachat,IntegerType, true ))),true) (of class
 org.apache.spark.sql.catalyst.types.ArrayType)
 at

 org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:247)
 at
 org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247)
 at
 org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263)
 at

 org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:84)
 at

 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:66)
 at

 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:50)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at
 org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org
 $apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:149)
 at

 org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
 at

 org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
 at
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)

 The strange thing is that nullable of date and nbPurchase field are set to
 true while it were false in the code. If I set both to true, it works. But,
 in fact, they should not be nullable.

 Here's what I find at Cast.scala:247 on 1.1.0 branch

   private[this] lazy val cast: Any = Any = dataType match {
 case StringType = castToString
 case BinaryType = castToBinary
 case DecimalType = castToDecimal
 case TimestampType = castToTimestamp
 case BooleanType = castToBoolean
 case ByteType = castToByte
 case ShortType = castToShort
 case IntegerType = castToInt
 case FloatType = castToFloat
 case LongType = castToLong
 case DoubleType = castToDouble
   }

 Any idea? Thank you.

 Hao



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/scala-MatchError-on-SparkSQL-when-creating-ArrayType-of-StructType-tp20459.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




RE: scala.MatchError

2014-11-12 Thread Naveen Kumar Pokala
Hi,

Do you mean with java, I shouldn’t have Issue class as a property (attribute) 
in Instrument Class?

Ex :

Class Issue {
Int a;
}

Class Instrument {

Issue issue;

}


How about scala? Does it support such user defined datatypes in classes

Case class Issue .


case class Issue( a:Int = 0)

case class Instrument(issue: Issue = null)




-Naveen

From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Wednesday, November 12, 2014 12:09 AM
To: Xiangrui Meng
Cc: Naveen Kumar Pokala; user@spark.apache.org
Subject: Re: scala.MatchError

Xiangrui is correct that is must be a java bean, also nested classes are not 
yet supported in java.

On Tue, Nov 11, 2014 at 10:11 AM, Xiangrui Meng 
men...@gmail.commailto:men...@gmail.com wrote:
I think you need a Java bean class instead of a normal class. See
example here: http://spark.apache.org/docs/1.1.0/sql-programming-guide.html
(switch to the java tab). -Xiangrui

On Tue, Nov 11, 2014 at 7:18 AM, Naveen Kumar Pokala
npok...@spcapitaliq.commailto:npok...@spcapitaliq.com wrote:
 Hi,



 This is my Instrument java constructor.



 public Instrument(Issue issue, Issuer issuer, Issuing issuing) {

 super();

 this.issue = issue;

 this.issuer = issuer;

 this.issuing = issuing;

 }





 I am trying to create javaschemaRDD



 JavaSchemaRDD schemaInstruments = sqlCtx.applySchema(distData,
 Instrument.class);



 Remarks:

 



 Instrument, Issue, Issuer, Issuing all are java classes



 distData is holding List Instrument 





 I am getting the following error.







 Exception in thread Driver java.lang.reflect.InvocationTargetException

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:483)

 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)

 Caused by: scala.MatchError: class sample.spark.test.Issue (of class
 java.lang.Class)

 at
 org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:189)

 at
 org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:188)

 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

 at
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

 at
 scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

 at
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

 at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)

 at
 org.apache.spark.sql.api.java.JavaSQLContext.getSchema(JavaSQLContext.scala:188)

 at
 org.apache.spark.sql.api.java.JavaSQLContext.applySchema(JavaSQLContext.scala:90)

 at sample.spark.test.SparkJob.main(SparkJob.java:33)

 ... 5 more



 Please help me.



 Regards,

 Naveen.
-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org



Re: scala.MatchError

2014-11-11 Thread Xiangrui Meng
I think you need a Java bean class instead of a normal class. See
example here: http://spark.apache.org/docs/1.1.0/sql-programming-guide.html
(switch to the java tab). -Xiangrui

On Tue, Nov 11, 2014 at 7:18 AM, Naveen Kumar Pokala
npok...@spcapitaliq.com wrote:
 Hi,



 This is my Instrument java constructor.



 public Instrument(Issue issue, Issuer issuer, Issuing issuing) {

 super();

 this.issue = issue;

 this.issuer = issuer;

 this.issuing = issuing;

 }





 I am trying to create javaschemaRDD



 JavaSchemaRDD schemaInstruments = sqlCtx.applySchema(distData,
 Instrument.class);



 Remarks:

 



 Instrument, Issue, Issuer, Issuing all are java classes



 distData is holding List Instrument 





 I am getting the following error.







 Exception in thread Driver java.lang.reflect.InvocationTargetException

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:483)

 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)

 Caused by: scala.MatchError: class sample.spark.test.Issue (of class
 java.lang.Class)

 at
 org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:189)

 at
 org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:188)

 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

 at
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

 at
 scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

 at
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

 at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)

 at
 org.apache.spark.sql.api.java.JavaSQLContext.getSchema(JavaSQLContext.scala:188)

 at
 org.apache.spark.sql.api.java.JavaSQLContext.applySchema(JavaSQLContext.scala:90)

 at sample.spark.test.SparkJob.main(SparkJob.java:33)

 ... 5 more



 Please help me.



 Regards,

 Naveen.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: scala.MatchError

2014-11-11 Thread Michael Armbrust
Xiangrui is correct that is must be a java bean, also nested classes are
not yet supported in java.

On Tue, Nov 11, 2014 at 10:11 AM, Xiangrui Meng men...@gmail.com wrote:

 I think you need a Java bean class instead of a normal class. See
 example here:
 http://spark.apache.org/docs/1.1.0/sql-programming-guide.html
 (switch to the java tab). -Xiangrui

 On Tue, Nov 11, 2014 at 7:18 AM, Naveen Kumar Pokala
 npok...@spcapitaliq.com wrote:
  Hi,
 
 
 
  This is my Instrument java constructor.
 
 
 
  public Instrument(Issue issue, Issuer issuer, Issuing issuing) {
 
  super();
 
  this.issue = issue;
 
  this.issuer = issuer;
 
  this.issuing = issuing;
 
  }
 
 
 
 
 
  I am trying to create javaschemaRDD
 
 
 
  JavaSchemaRDD schemaInstruments = sqlCtx.applySchema(distData,
  Instrument.class);
 
 
 
  Remarks:
 
  
 
 
 
  Instrument, Issue, Issuer, Issuing all are java classes
 
 
 
  distData is holding List Instrument 
 
 
 
 
 
  I am getting the following error.
 
 
 
 
 
 
 
  Exception in thread Driver java.lang.reflect.InvocationTargetException
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
  at java.lang.reflect.Method.invoke(Method.java:483)
 
  at
 
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
 
  Caused by: scala.MatchError: class sample.spark.test.Issue (of class
  java.lang.Class)
 
  at
 
 org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:189)
 
  at
 
 org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:188)
 
  at
 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 
  at
 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 
  at
 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 
  at
  scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
 
  at
  scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 
  at
 scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
 
  at
 
 org.apache.spark.sql.api.java.JavaSQLContext.getSchema(JavaSQLContext.scala:188)
 
  at
 
 org.apache.spark.sql.api.java.JavaSQLContext.applySchema(JavaSQLContext.scala:90)
 
  at sample.spark.test.SparkJob.main(SparkJob.java:33)
 
  ... 5 more
 
 
 
  Please help me.
 
 
 
  Regards,
 
  Naveen.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




RE: scala.MatchError: class java.sql.Timestamp

2014-10-19 Thread Wang, Daoyuan
Can you provide the exception stack?

Thanks,
Daoyuan

From: Ge, Yao (Y.) [mailto:y...@ford.com]
Sent: Sunday, October 19, 2014 10:17 PM
To: user@spark.apache.org
Subject: scala.MatchError: class java.sql.Timestamp

I am working with Spark 1.1.0 and I believe Timestamp is a supported data type 
for Spark SQL. However I keep getting this MatchError for java.sql.Timestamp 
when I try to use reflection to register a Java Bean with Timestamp field. 
Anything wrong with my code below?

public static class Event implements Serializable {
private String name;
private Timestamp time;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public Timestamp getTime() {
return time;
}
public void setTime(Timestamp time) {
this.time = time;
}
}

@Test
public void testTimeStamp() {
JavaSparkContext sc = new 
JavaSparkContext(local, timestamp);
String[] data = {1,2014-01-01, 
2,2014-02-01};
JavaRDDString input = 
sc.parallelize(Arrays.asList(data));
JavaRDDEvent events = input.map(new 
FunctionString,Event() {
public Event call(String arg0) 
throws Exception {
String[] c = 
arg0.split(,);
Event e = new 
Event();
e.setName(c[0]);
DateFormat fmt 
= new SimpleDateFormat(-MM-dd);
e.setTime(new 
Timestamp(fmt.parse(c[1]).getTime()));
return e;
}
});

JavaSQLContext sqlCtx = new JavaSQLContext(sc);
JavaSchemaRDD schemaEvent = 
sqlCtx.applySchema(events, Event.class);
schemaEvent.registerTempTable(event);

sc.stop();
}


RE: scala.MatchError: class java.sql.Timestamp

2014-10-19 Thread Ge, Yao (Y.)
scala.MatchError: class java.sql.Timestamp (of class java.lang.Class)
at 
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:189)
at 
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:188)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at 
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at 
org.apache.spark.sql.api.java.JavaSQLContext.getSchema(JavaSQLContext.scala:188)
at 
org.apache.spark.sql.api.java.JavaSQLContext.applySchema(JavaSQLContext.scala:90)
at 
com.ford.dtc.ff.SessionStats.testTimeStamp(SessionStats.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)


From: Wang, Daoyuan [mailto:daoyuan.w...@intel.com]
Sent: Sunday, October 19, 2014 10:31 AM
To: Ge, Yao (Y.); user@spark.apache.org
Subject: RE: scala.MatchError: class java.sql.Timestamp

Can you provide the exception stack?

Thanks,
Daoyuan

From: Ge, Yao (Y.) [mailto:y...@ford.com]
Sent: Sunday, October 19, 2014 10:17 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
Subject: scala.MatchError: class java.sql.Timestamp

I am working with Spark 1.1.0 and I believe Timestamp is a supported data type 
for Spark SQL. However I keep getting this MatchError for java.sql.Timestamp 
when I try to use reflection to register a Java Bean with Timestamp field. 
Anything wrong with my code below?

public static class Event implements Serializable {
private String name;
private Timestamp time;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public Timestamp getTime() {
return time;
}
public void setTime(Timestamp time) {
this.time = time;
}
}

@Test
public void testTimeStamp() {
JavaSparkContext sc = new

RE: scala.MatchError: class java.sql.Timestamp

2014-10-19 Thread Cheng, Hao
Seems bugs in the JavaSQLContext.getSchema(), which doesn't enumerate all of 
the data types supported by Catalyst.

From: Ge, Yao (Y.) [mailto:y...@ford.com]
Sent: Sunday, October 19, 2014 11:44 PM
To: Wang, Daoyuan; user@spark.apache.org
Subject: RE: scala.MatchError: class java.sql.Timestamp

scala.MatchError: class java.sql.Timestamp (of class java.lang.Class)
at 
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:189)
at 
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:188)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at 
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at 
org.apache.spark.sql.api.java.JavaSQLContext.getSchema(JavaSQLContext.scala:188)
at 
org.apache.spark.sql.api.java.JavaSQLContext.applySchema(JavaSQLContext.scala:90)
at 
com.ford.dtc.ff.SessionStats.testTimeStamp(SessionStats.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)


From: Wang, Daoyuan [mailto:daoyuan.w...@intel.com]
Sent: Sunday, October 19, 2014 10:31 AM
To: Ge, Yao (Y.); user@spark.apache.orgmailto:user@spark.apache.org
Subject: RE: scala.MatchError: class java.sql.Timestamp

Can you provide the exception stack?

Thanks,
Daoyuan

From: Ge, Yao (Y.) [mailto:y...@ford.com]
Sent: Sunday, October 19, 2014 10:17 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
Subject: scala.MatchError: class java.sql.Timestamp

I am working with Spark 1.1.0 and I believe Timestamp is a supported data type 
for Spark SQL. However I keep getting this MatchError for java.sql.Timestamp 
when I try to use reflection to register a Java Bean with Timestamp field. 
Anything wrong with my code below?

public static class Event implements Serializable {
private String name;
private Timestamp time;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public Timestamp getTime() {
return time

RE: scala.MatchError: class java.sql.Timestamp

2014-10-19 Thread Wang, Daoyuan
I have created an issue for this 
https://issues.apache.org/jira/browse/SPARK-4003


From: Cheng, Hao
Sent: Monday, October 20, 2014 9:20 AM
To: Ge, Yao (Y.); Wang, Daoyuan; user@spark.apache.org
Subject: RE: scala.MatchError: class java.sql.Timestamp

Seems bugs in the JavaSQLContext.getSchema(), which doesn't enumerate all of 
the data types supported by Catalyst.

From: Ge, Yao (Y.) [mailto:y...@ford.com]
Sent: Sunday, October 19, 2014 11:44 PM
To: Wang, Daoyuan; user@spark.apache.orgmailto:user@spark.apache.org
Subject: RE: scala.MatchError: class java.sql.Timestamp

scala.MatchError: class java.sql.Timestamp (of class java.lang.Class)
at 
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:189)
at 
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:188)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at 
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at 
org.apache.spark.sql.api.java.JavaSQLContext.getSchema(JavaSQLContext.scala:188)
at 
org.apache.spark.sql.api.java.JavaSQLContext.applySchema(JavaSQLContext.scala:90)
at 
com.ford.dtc.ff.SessionStats.testTimeStamp(SessionStats.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)


From: Wang, Daoyuan [mailto:daoyuan.w...@intel.com]
Sent: Sunday, October 19, 2014 10:31 AM
To: Ge, Yao (Y.); user@spark.apache.orgmailto:user@spark.apache.org
Subject: RE: scala.MatchError: class java.sql.Timestamp

Can you provide the exception stack?

Thanks,
Daoyuan

From: Ge, Yao (Y.) [mailto:y...@ford.com]
Sent: Sunday, October 19, 2014 10:17 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
Subject: scala.MatchError: class java.sql.Timestamp

I am working with Spark 1.1.0 and I believe Timestamp is a supported data type 
for Spark SQL. However I keep getting this MatchError for java.sql.Timestamp 
when I try to use reflection to register a Java Bean with Timestamp field. 
Anything wrong with my code below?

public static class Event implements Serializable {
private String name;
private Timestamp time;
public String getName() {
return name;
}
public void