Re: scala.MatchError while doing BinaryClassificationMetrics
Thank, Nick. This worked for me. val evaluator = new BinaryClassificationEvaluator(). setLabelCol("label"). setRawPredictionCol("ModelProbability"). setMetricName("areaUnderROC") val auROC = evaluator.evaluate(testResults) On Mon, Nov 14, 2016 at 4:00 PM, Nick Pentreathwrote: > Typically you pass in the result of a model transform to the evaluator. > > So: > val model = estimator.fit(data) > val auc = evaluator.evaluate(model.transform(testData) > > Check Scala API docs for some details: http://spark.apache. > org/docs/latest/api/scala/index.html#org.apache.spark.ml.evaluation. > BinaryClassificationEvaluator > > On Mon, 14 Nov 2016 at 20:02 Bhaarat Sharma wrote: > > Can you please suggest how I can use BinaryClassificationEvaluator? I > tried: > > scala> import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator > import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator > > scala> val evaluator = new BinaryClassificationEvaluator() > evaluator: org.apache.spark.ml.evaluation.BinaryClassificationEvaluator = > binEval_0d57372b7579 > > Try 1: > > scala> evaluator.evaluate(testScoreAndLabel.rdd) > :105: error: type mismatch; > found : org.apache.spark.rdd.RDD[(Double, Double)] > required: org.apache.spark.sql.Dataset[_] >evaluator.evaluate(testScoreAndLabel.rdd) > > Try 2: > > scala> evaluator.evaluate(testScoreAndLabel) > java.lang.IllegalArgumentException: Field "rawPrediction" does not exist. > at org.apache.spark.sql.types.StructType$$anonfun$apply$1. > apply(StructType.scala:228) > > Try 3: > > scala> evaluator.evaluate(testScoreAndLabel.select(" > Label","ModelProbability")) > org.apache.spark.sql.AnalysisException: cannot resolve '`Label`' given > input columns: [_1, _2]; > at org.apache.spark.sql.catalyst.analysis.package$ > AnalysisErrorAt.failAnalysis(package.scala:42) > > > On Mon, Nov 14, 2016 at 1:44 PM, Nick Pentreath > wrote: > > DataFrame.rdd returns an RDD[Row]. You'll need to use map to extract the > doubles from the test score and label DF. > > But you may prefer to just use spark.ml evaluators, which work with > DataFrames. Try BinaryClassificationEvaluator. > > On Mon, 14 Nov 2016 at 19:30, Bhaarat Sharma wrote: > > I am getting scala.MatchError in the code below. I'm not able to see why > this would be happening. I am using Spark 2.0.1 > > scala> testResults.columns > res538: Array[String] = Array(TopicVector, subject_id, hadm_id, isElective, > isNewborn, isUrgent, isEmergency, isMale, isFemale, oasis_score, > sapsii_score, sofa_score, age, hosp_death, test, ModelFeatures, Label, > rawPrediction, ModelProbability, ModelPrediction) > > scala> testResults.select("Label","ModelProbability").take(1) > res542: Array[org.apache.spark.sql.Row] = > Array([0.0,[0.737304818744076,0.262695181255924]]) > > scala> val testScoreAndLabel = testResults. > | select("Label","ModelProbability"). > | map { case Row(l:Double, p:Vector) => (p(1), l) } > testScoreAndLabel: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: > double, _2: double] > > scala> testScoreAndLabel > res539: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: double, _2: > double] > > scala> testScoreAndLabel.columns > res540: Array[String] = Array(_1, _2) > > scala> val testMetrics = new > BinaryClassificationMetrics(testScoreAndLabel.rdd) > testMetrics: org.apache.spark.mllib.evaluation.BinaryClassificationMetrics = > org.apache.spark.mllib.evaluation.BinaryClassificationMetrics@36e780d1 > > The code below gives the error > > val auROC = testMetrics.areaUnderROC() //this line gives the error > > Caused by: scala.MatchError: [0.0,[0.7316583497453766,0.2683416502546234]] > (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) > > >
Re: scala.MatchError while doing BinaryClassificationMetrics
Typically you pass in the result of a model transform to the evaluator. So: val model = estimator.fit(data) val auc = evaluator.evaluate(model.transform(testData) Check Scala API docs for some details: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.evaluation.BinaryClassificationEvaluator On Mon, 14 Nov 2016 at 20:02 Bhaarat Sharmawrote: Can you please suggest how I can use BinaryClassificationEvaluator? I tried: scala> import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator scala> val evaluator = new BinaryClassificationEvaluator() evaluator: org.apache.spark.ml.evaluation.BinaryClassificationEvaluator = binEval_0d57372b7579 Try 1: scala> evaluator.evaluate(testScoreAndLabel.rdd) :105: error: type mismatch; found : org.apache.spark.rdd.RDD[(Double, Double)] required: org.apache.spark.sql.Dataset[_] evaluator.evaluate(testScoreAndLabel.rdd) Try 2: scala> evaluator.evaluate(testScoreAndLabel) java.lang.IllegalArgumentException: Field "rawPrediction" does not exist. at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:228) Try 3: scala> evaluator.evaluate(testScoreAndLabel.select("Label","ModelProbability")) org.apache.spark.sql.AnalysisException: cannot resolve '`Label`' given input columns: [_1, _2]; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) On Mon, Nov 14, 2016 at 1:44 PM, Nick Pentreath wrote: DataFrame.rdd returns an RDD[Row]. You'll need to use map to extract the doubles from the test score and label DF. But you may prefer to just use spark.ml evaluators, which work with DataFrames. Try BinaryClassificationEvaluator. On Mon, 14 Nov 2016 at 19:30, Bhaarat Sharma wrote: I am getting scala.MatchError in the code below. I'm not able to see why this would be happening. I am using Spark 2.0.1 scala> testResults.columns res538: Array[String] = Array(TopicVector, subject_id, hadm_id, isElective, isNewborn, isUrgent, isEmergency, isMale, isFemale, oasis_score, sapsii_score, sofa_score, age, hosp_death, test, ModelFeatures, Label, rawPrediction, ModelProbability, ModelPrediction) scala> testResults.select("Label","ModelProbability").take(1) res542: Array[org.apache.spark.sql.Row] = Array([0.0,[0.737304818744076,0.262695181255924]]) scala> val testScoreAndLabel = testResults. | select("Label","ModelProbability"). | map { case Row(l:Double, p:Vector) => (p(1), l) } testScoreAndLabel: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: double, _2: double] scala> testScoreAndLabel res539: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: double, _2: double] scala> testScoreAndLabel.columns res540: Array[String] = Array(_1, _2) scala> val testMetrics = new BinaryClassificationMetrics(testScoreAndLabel.rdd) testMetrics: org.apache.spark.mllib.evaluation.BinaryClassificationMetrics = org.apache.spark.mllib.evaluation.BinaryClassificationMetrics@36e780d1 The code below gives the error val auROC = testMetrics.areaUnderROC() //this line gives the error Caused by: scala.MatchError: [0.0,[0.7316583497453766,0.2683416502546234]] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
Re: scala.MatchError while doing BinaryClassificationMetrics
Can you please suggest how I can use BinaryClassificationEvaluator? I tried: scala> import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator scala> val evaluator = new BinaryClassificationEvaluator() evaluator: org.apache.spark.ml.evaluation.BinaryClassificationEvaluator = binEval_0d57372b7579 Try 1: scala> evaluator.evaluate(testScoreAndLabel.rdd) :105: error: type mismatch; found : org.apache.spark.rdd.RDD[(Double, Double)] required: org.apache.spark.sql.Dataset[_] evaluator.evaluate(testScoreAndLabel.rdd) Try 2: scala> evaluator.evaluate(testScoreAndLabel) java.lang.IllegalArgumentException: Field "rawPrediction" does not exist. at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:228) Try 3: scala> evaluator.evaluate(testScoreAndLabel.select("Label","ModelProbability")) org.apache.spark.sql.AnalysisException: cannot resolve '`Label`' given input columns: [_1, _2]; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) On Mon, Nov 14, 2016 at 1:44 PM, Nick Pentreathwrote: > DataFrame.rdd returns an RDD[Row]. You'll need to use map to extract the > doubles from the test score and label DF. > > But you may prefer to just use spark.ml evaluators, which work with > DataFrames. Try BinaryClassificationEvaluator. > > On Mon, 14 Nov 2016 at 19:30, Bhaarat Sharma wrote: > >> I am getting scala.MatchError in the code below. I'm not able to see why >> this would be happening. I am using Spark 2.0.1 >> >> scala> testResults.columns >> res538: Array[String] = Array(TopicVector, subject_id, hadm_id, isElective, >> isNewborn, isUrgent, isEmergency, isMale, isFemale, oasis_score, >> sapsii_score, sofa_score, age, hosp_death, test, ModelFeatures, Label, >> rawPrediction, ModelProbability, ModelPrediction) >> >> scala> testResults.select("Label","ModelProbability").take(1) >> res542: Array[org.apache.spark.sql.Row] = >> Array([0.0,[0.737304818744076,0.262695181255924]]) >> >> scala> val testScoreAndLabel = testResults. >> | select("Label","ModelProbability"). >> | map { case Row(l:Double, p:Vector) => (p(1), l) } >> testScoreAndLabel: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: >> double, _2: double] >> >> scala> testScoreAndLabel >> res539: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: double, _2: >> double] >> >> scala> testScoreAndLabel.columns >> res540: Array[String] = Array(_1, _2) >> >> scala> val testMetrics = new >> BinaryClassificationMetrics(testScoreAndLabel.rdd) >> testMetrics: org.apache.spark.mllib.evaluation.BinaryClassificationMetrics = >> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics@36e780d1 >> >> The code below gives the error >> >> val auROC = testMetrics.areaUnderROC() //this line gives the error >> >> Caused by: scala.MatchError: [0.0,[0.7316583497453766,0.2683416502546234]] >> (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) >> >>
Re: scala.MatchError while doing BinaryClassificationMetrics
DataFrame.rdd returns an RDD[Row]. You'll need to use map to extract the doubles from the test score and label DF. But you may prefer to just use spark.ml evaluators, which work with DataFrames. Try BinaryClassificationEvaluator. On Mon, 14 Nov 2016 at 19:30, Bhaarat Sharmawrote: > I am getting scala.MatchError in the code below. I'm not able to see why > this would be happening. I am using Spark 2.0.1 > > scala> testResults.columns > res538: Array[String] = Array(TopicVector, subject_id, hadm_id, isElective, > isNewborn, isUrgent, isEmergency, isMale, isFemale, oasis_score, > sapsii_score, sofa_score, age, hosp_death, test, ModelFeatures, Label, > rawPrediction, ModelProbability, ModelPrediction) > > scala> testResults.select("Label","ModelProbability").take(1) > res542: Array[org.apache.spark.sql.Row] = > Array([0.0,[0.737304818744076,0.262695181255924]]) > > scala> val testScoreAndLabel = testResults. > | select("Label","ModelProbability"). > | map { case Row(l:Double, p:Vector) => (p(1), l) } > testScoreAndLabel: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: > double, _2: double] > > scala> testScoreAndLabel > res539: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: double, _2: > double] > > scala> testScoreAndLabel.columns > res540: Array[String] = Array(_1, _2) > > scala> val testMetrics = new > BinaryClassificationMetrics(testScoreAndLabel.rdd) > testMetrics: org.apache.spark.mllib.evaluation.BinaryClassificationMetrics = > org.apache.spark.mllib.evaluation.BinaryClassificationMetrics@36e780d1 > > The code below gives the error > > val auROC = testMetrics.areaUnderROC() //this line gives the error > > Caused by: scala.MatchError: [0.0,[0.7316583497453766,0.2683416502546234]] > (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) > >
Re: scala.MatchError on stand-alone cluster mode
Hi, Rishabh Bhardwaj, Saisai Shao, Thx for your help. I hava found that the key reason is I forgot to upload the jar package to all of the node in cluster, so after the master distributed the job and selected one node as the driver, the driver can not find the jar package and throw an exception. -- Mekal Zheng Sent with Airmail 发件人: Rishabh Bhardwaj <rbnex...@gmail.com> <rbnex...@gmail.com> 回复: Rishabh Bhardwaj <rbnex...@gmail.com> <rbnex...@gmail.com> 日期: July 15, 2016 at 17:28:43 至: Saisai Shao <sai.sai.s...@gmail.com> <sai.sai.s...@gmail.com> 抄送: Mekal Zheng <mekal.zh...@gmail.com> <mekal.zh...@gmail.com>, spark users <user@spark.apache.org> <user@spark.apache.org> 主题: Re: scala.MatchError on stand-alone cluster mode Hi Mekal, It may be a scala version mismatch error,kindly check whether you are running both (your streaming app and spark cluster ) on 2.10 scala or 2.11. Thanks, Rishabh. On Fri, Jul 15, 2016 at 1:38 PM, Saisai Shao <sai.sai.s...@gmail.com> wrote: > The error stack is throwing from your code: > > Caused by: scala.MatchError: [Ljava.lang.String;@68d279ec (of class > [Ljava.lang.String;) > at com.jd.deeplog.LogAggregator$.main(LogAggregator.scala:29) > at com.jd.deeplog.LogAggregator.main(LogAggregator.scala) > > I think you should debug the code yourself, it may not be the problem of > Spark. > > On Fri, Jul 15, 2016 at 3:17 PM, Mekal Zheng <mekal.zh...@gmail.com> > wrote: > >> Hi, >> >> I have a Spark Streaming job written in Scala and is running well on >> local and client mode, but when I submit it on cluster mode, the driver >> reported an error shown as below. >> Is there anyone know what is wrong here? >> pls help me! >> >> the Job CODE is after >> >> 16/07/14 17:28:21 DEBUG ByteBufUtil: >> -Dio.netty.threadLocalDirectBufferSize: 65536 >> 16/07/14 17:28:21 DEBUG NetUtil: Loopback interface: lo (lo, >> 0:0:0:0:0:0:0:1%lo) >> 16/07/14 17:28:21 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 32768 >> 16/07/14 17:28:21 DEBUG TransportServer: Shuffle server started on port >> :43492 >> 16/07/14 17:28:21 INFO Utils: Successfully started service 'Driver' on >> port 43492. >> 16/07/14 17:28:21 INFO WorkerWatcher: Connecting to worker spark:// >> Worker@172.20.130.98:23933 >> 16/07/14 17:28:21 DEBUG TransportClientFactory: Creating new connection >> to /172.20.130.98:23933 >> Exception in thread "main" java.lang.reflect.InvocationTargetException >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:497) >> at >> org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58) >> at >> org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) >> Caused by: scala.MatchError: [Ljava.lang.String;@68d279ec (of class >> [Ljava.lang.String;) >> at com.jd.deeplog.LogAggregator$.main(LogAggregator.scala:29) >> at com.jd.deeplog.LogAggregator.main(LogAggregator.scala) >> ... 6 more >> >> == >> Job CODE: >> >> object LogAggregator { >> >> val batchDuration = Seconds(5) >> >> def main(args:Array[String]) { >> >> val usage = >> """Usage: LogAggregator >> >> | logFormat: fieldName:fieldRole[,fieldName:fieldRole] each field >> must have both name and role >> | logFormat.role: can be key|avg|enum|sum|ignore >> """.stripMargin >> >> if (args.length < 9) { >> System.err.println(usage) >> System.exit(1) >> } >> >> val Array(zkQuorum, group, topics, numThreads, logFormat, logSeparator, >> batchDuration, destType, destPath) = args >> >> println("Start streaming calculation...") >> >> val conf = new SparkConf().setAppName("LBHaproxy-LogAggregator") >> val ssc = new StreamingContext(conf, Seconds(batchDuration.toInt)) >> >> val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap >> >> val lines = KafkaUtils.createStream(ssc, zkQuorum, group, >> topicMap).map(_._2) >> >> val logFields = logFormat.split(",").map(field => { >> val fld = field.split(":") >> if (fld.s
Re: scala.MatchError on stand-alone cluster mode
The error stack is throwing from your code: Caused by: scala.MatchError: [Ljava.lang.String;@68d279ec (of class [Ljava.lang.String;) at com.jd.deeplog.LogAggregator$.main(LogAggregator.scala:29) at com.jd.deeplog.LogAggregator.main(LogAggregator.scala) I think you should debug the code yourself, it may not be the problem of Spark. On Fri, Jul 15, 2016 at 3:17 PM, Mekal Zhengwrote: > Hi, > > I have a Spark Streaming job written in Scala and is running well on local > and client mode, but when I submit it on cluster mode, the driver reported > an error shown as below. > Is there anyone know what is wrong here? > pls help me! > > the Job CODE is after > > 16/07/14 17:28:21 DEBUG ByteBufUtil: > -Dio.netty.threadLocalDirectBufferSize: 65536 > 16/07/14 17:28:21 DEBUG NetUtil: Loopback interface: lo (lo, > 0:0:0:0:0:0:0:1%lo) > 16/07/14 17:28:21 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 32768 > 16/07/14 17:28:21 DEBUG TransportServer: Shuffle server started on port > :43492 > 16/07/14 17:28:21 INFO Utils: Successfully started service 'Driver' on > port 43492. > 16/07/14 17:28:21 INFO WorkerWatcher: Connecting to worker spark:// > Worker@172.20.130.98:23933 > 16/07/14 17:28:21 DEBUG TransportClientFactory: Creating new connection to > /172.20.130.98:23933 > Exception in thread "main" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58) > at > org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) > Caused by: scala.MatchError: [Ljava.lang.String;@68d279ec (of class > [Ljava.lang.String;) > at com.jd.deeplog.LogAggregator$.main(LogAggregator.scala:29) > at com.jd.deeplog.LogAggregator.main(LogAggregator.scala) > ... 6 more > > == > Job CODE: > > object LogAggregator { > > val batchDuration = Seconds(5) > > def main(args:Array[String]) { > > val usage = > """Usage: LogAggregator > > | logFormat: fieldName:fieldRole[,fieldName:fieldRole] each field > must have both name and role > | logFormat.role: can be key|avg|enum|sum|ignore > """.stripMargin > > if (args.length < 9) { > System.err.println(usage) > System.exit(1) > } > > val Array(zkQuorum, group, topics, numThreads, logFormat, logSeparator, > batchDuration, destType, destPath) = args > > println("Start streaming calculation...") > > val conf = new SparkConf().setAppName("LBHaproxy-LogAggregator") > val ssc = new StreamingContext(conf, Seconds(batchDuration.toInt)) > > val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap > > val lines = KafkaUtils.createStream(ssc, zkQuorum, group, > topicMap).map(_._2) > > val logFields = logFormat.split(",").map(field => { > val fld = field.split(":") > if (fld.size != 2) { > System.err.println("Wrong parameters for logFormat!\n") > System.err.println(usage) > System.exit(1) > } > // TODO: ensure the field has both 'name' and 'role' > new LogField(fld(0), fld(1)) > }) > > val keyFields = logFields.filter(logFieldName => { > logFieldName.role == "key" > }) > val keys = keyFields.map(key => { > key.name > }) > > val logsByKey = lines.map(line => { > val log = new Log(logFields, line, logSeparator) > log.toMap > }).filter(log => log.nonEmpty).map(log => { > val keys = keyFields.map(logField => { > log(logField.name).value > }) > > val key = keys.reduce((key1, key2) => { > key1.asInstanceOf[String] + key2.asInstanceOf[String] > }) > > val fullLog = log + ("count" -> new LogSegment("sum", 1)) > > (key, fullLog) > }) > > > val aggResults = logsByKey.reduceByKey((log_a, log_b) => { > > log_a.map(logField => { > val logFieldName = logField._1 > val logSegment_a = logField._2 > val logSegment_b = log_b(logFieldName) > > val segValue = logSegment_a.role match { > case "avg" => { > logSegment_a.value.toString.toInt + > logSegment_b.value.toString.toInt > } > case "sum" => { > logSegment_a.value.toString.toInt + > logSegment_b.value.toString.toInt > } > case "enum" => { > val list_a = logSegment_a.value.asInstanceOf[List[(String, Int)]] > val list_b = logSegment_b.value.asInstanceOf[List[(String, Int)]] > list_a ++ list_b > } > case _ => logSegment_a.value > } >
Re: scala.MatchError: class org.apache.avro.Schema (of class java.lang.Class)
For more details on my question http://apache-spark-user-list.1001560.n3.nabble.com/How-to-generate-Java-bean-class-for-avro-files-using-spark-avro-project-tp22413.html Thanks, Yamini On Tue, Apr 7, 2015 at 2:23 PM, Yamini Maddirala yamini.m...@gmail.com wrote: Hi Michael, Yes, I did try spark-avro 0.2.0 databricks project. I am using CHD5.3 which is based on spark 1.2. Hence I'm bound to use spark-avro 0.2.0 instead of the latest. I'm not sure how spark-avro project can help me in this scenario. 1. I have JavaDStream of type avro generic record :JavaDStreamGenericRecord [This is the data being read from kafka topics] 2. I'm able to get JavaSchemaRDD using the avro file like this final JavaSchemaRDD schemaRDD2 = AvroUtils.avroFile(sqlContext, /xyz-Project/trunk/src/main/resources/xyz.avro); 3. I don't know how I can apply schema in step 2 to data in step 1. I chose to do something like this JavaSchemaRDD schemaRDD = sqlContext.applySchema(genericRecordJavaRDD, xyz.class); Used avro maven plugin to generate xyz class in Java. But this is not good because avro maven plugin creates a field SCHEMA which is not supported in applySchema method. Please let me know how to deal with this. Appreciate your help Thanks, Yamini On Tue, Apr 7, 2015 at 1:57 PM, Michael Armbrust mich...@databricks.com wrote: Have you looked at spark-avro? https://github.com/databricks/spark-avro On Tue, Apr 7, 2015 at 3:57 AM, Yamini yamini.m...@gmail.com wrote: Using spark(1.2) streaming to read avro schema based topics flowing in kafka and then using spark sql context to register data as temp table. Avro maven plugin(1.7.7 version) generates the java bean class for the avro file but includes a field named SCHEMA$ of type org.apache.avro.Schema which is not supported in the JavaSQLContext class[Method : applySchema]. How to auto generate java bean class for the avro file and over come the above mentioned problem. Thanks. - Thanks, Yamini -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/scala-MatchError-class-org-apache-avro-Schema-of-class-java-lang-Class-tp22402.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: scala.MatchError: class org.apache.avro.Schema (of class java.lang.Class)
Have you looked at spark-avro? https://github.com/databricks/spark-avro On Tue, Apr 7, 2015 at 3:57 AM, Yamini yamini.m...@gmail.com wrote: Using spark(1.2) streaming to read avro schema based topics flowing in kafka and then using spark sql context to register data as temp table. Avro maven plugin(1.7.7 version) generates the java bean class for the avro file but includes a field named SCHEMA$ of type org.apache.avro.Schema which is not supported in the JavaSQLContext class[Method : applySchema]. How to auto generate java bean class for the avro file and over come the above mentioned problem. Thanks. - Thanks, Yamini -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/scala-MatchError-class-org-apache-avro-Schema-of-class-java-lang-Class-tp22402.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: scala.MatchError: class org.apache.avro.Schema (of class java.lang.Class)
Hi Michael, Yes, I did try spark-avro 0.2.0 databricks project. I am using CHD5.3 which is based on spark 1.2. Hence I'm bound to use spark-avro 0.2.0 instead of the latest. I'm not sure how spark-avro project can help me in this scenario. 1. I have JavaDStream of type avro generic record :JavaDStreamGenericRecord [This is the data being read from kafka topics] 2. I'm able to get JavaSchemaRDD using the avro file like this final JavaSchemaRDD schemaRDD2 = AvroUtils.avroFile(sqlContext, /xyz-Project/trunk/src/main/resources/xyz.avro); 3. I don't know how I can apply schema in step 2 to data in step 1. I chose to do something like this JavaSchemaRDD schemaRDD = sqlContext.applySchema(genericRecordJavaRDD, xyz.class); Used avro maven plugin to generate xyz class in Java. But this is not good because avro maven plugin creates a field SCHEMA which is not supported in applySchema method. Please let me know how to deal with this. Appreciate your help Thanks, Yamini On Tue, Apr 7, 2015 at 1:57 PM, Michael Armbrust mich...@databricks.com wrote: Have you looked at spark-avro? https://github.com/databricks/spark-avro On Tue, Apr 7, 2015 at 3:57 AM, Yamini yamini.m...@gmail.com wrote: Using spark(1.2) streaming to read avro schema based topics flowing in kafka and then using spark sql context to register data as temp table. Avro maven plugin(1.7.7 version) generates the java bean class for the avro file but includes a field named SCHEMA$ of type org.apache.avro.Schema which is not supported in the JavaSQLContext class[Method : applySchema]. How to auto generate java bean class for the avro file and over come the above mentioned problem. Thanks. - Thanks, Yamini -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/scala-MatchError-class-org-apache-avro-Schema-of-class-java-lang-Class-tp22402.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: scala.MatchError on SparkSQL when creating ArrayType of StructType
All values in Hive are always nullable, though you should still not be seeing this error. It should be addressed by this patch: https://github.com/apache/spark/pull/3150 On Fri, Dec 5, 2014 at 2:36 AM, Hao Ren inv...@gmail.com wrote: Hi, I am using SparkSQL on 1.1.0 branch. The following code leads to a scala.MatchError at org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:247) val scm = StructType(inputRDD.schema.fields.init :+ StructField(list, ArrayType( StructType( Seq(StructField(date, StringType, nullable = false), StructField(nbPurchase, IntegerType, nullable = false, nullable = false)) // purchaseRDD is RDD[sql.ROW] whose schema is corresponding to scm. It is transformed from inputRDD val schemaRDD = hiveContext.applySchema(purchaseRDD, scm) schemaRDD.registerTempTable(t_purchase) Here's the stackTrace: scala.MatchError: ArrayType(StructType(List(StructField(date,StringType, true ), StructField(n_reachat,IntegerType, true ))),true) (of class org.apache.spark.sql.catalyst.types.ArrayType) at org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:247) at org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247) at org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:84) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:66) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:50) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org $apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:149) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) The strange thing is that nullable of date and nbPurchase field are set to true while it were false in the code. If I set both to true, it works. But, in fact, they should not be nullable. Here's what I find at Cast.scala:247 on 1.1.0 branch private[this] lazy val cast: Any = Any = dataType match { case StringType = castToString case BinaryType = castToBinary case DecimalType = castToDecimal case TimestampType = castToTimestamp case BooleanType = castToBoolean case ByteType = castToByte case ShortType = castToShort case IntegerType = castToInt case FloatType = castToFloat case LongType = castToLong case DoubleType = castToDouble } Any idea? Thank you. Hao -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/scala-MatchError-on-SparkSQL-when-creating-ArrayType-of-StructType-tp20459.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: scala.MatchError
Hi, Do you mean with java, I shouldn’t have Issue class as a property (attribute) in Instrument Class? Ex : Class Issue { Int a; } Class Instrument { Issue issue; } How about scala? Does it support such user defined datatypes in classes Case class Issue . case class Issue( a:Int = 0) case class Instrument(issue: Issue = null) -Naveen From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Wednesday, November 12, 2014 12:09 AM To: Xiangrui Meng Cc: Naveen Kumar Pokala; user@spark.apache.org Subject: Re: scala.MatchError Xiangrui is correct that is must be a java bean, also nested classes are not yet supported in java. On Tue, Nov 11, 2014 at 10:11 AM, Xiangrui Meng men...@gmail.commailto:men...@gmail.com wrote: I think you need a Java bean class instead of a normal class. See example here: http://spark.apache.org/docs/1.1.0/sql-programming-guide.html (switch to the java tab). -Xiangrui On Tue, Nov 11, 2014 at 7:18 AM, Naveen Kumar Pokala npok...@spcapitaliq.commailto:npok...@spcapitaliq.com wrote: Hi, This is my Instrument java constructor. public Instrument(Issue issue, Issuer issuer, Issuing issuing) { super(); this.issue = issue; this.issuer = issuer; this.issuing = issuing; } I am trying to create javaschemaRDD JavaSchemaRDD schemaInstruments = sqlCtx.applySchema(distData, Instrument.class); Remarks: Instrument, Issue, Issuer, Issuing all are java classes distData is holding List Instrument I am getting the following error. Exception in thread Driver java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) Caused by: scala.MatchError: class sample.spark.test.Issue (of class java.lang.Class) at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:189) at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:188) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.sql.api.java.JavaSQLContext.getSchema(JavaSQLContext.scala:188) at org.apache.spark.sql.api.java.JavaSQLContext.applySchema(JavaSQLContext.scala:90) at sample.spark.test.SparkJob.main(SparkJob.java:33) ... 5 more Please help me. Regards, Naveen. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org
Re: scala.MatchError
I think you need a Java bean class instead of a normal class. See example here: http://spark.apache.org/docs/1.1.0/sql-programming-guide.html (switch to the java tab). -Xiangrui On Tue, Nov 11, 2014 at 7:18 AM, Naveen Kumar Pokala npok...@spcapitaliq.com wrote: Hi, This is my Instrument java constructor. public Instrument(Issue issue, Issuer issuer, Issuing issuing) { super(); this.issue = issue; this.issuer = issuer; this.issuing = issuing; } I am trying to create javaschemaRDD JavaSchemaRDD schemaInstruments = sqlCtx.applySchema(distData, Instrument.class); Remarks: Instrument, Issue, Issuer, Issuing all are java classes distData is holding List Instrument I am getting the following error. Exception in thread Driver java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) Caused by: scala.MatchError: class sample.spark.test.Issue (of class java.lang.Class) at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:189) at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:188) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.sql.api.java.JavaSQLContext.getSchema(JavaSQLContext.scala:188) at org.apache.spark.sql.api.java.JavaSQLContext.applySchema(JavaSQLContext.scala:90) at sample.spark.test.SparkJob.main(SparkJob.java:33) ... 5 more Please help me. Regards, Naveen. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: scala.MatchError
Xiangrui is correct that is must be a java bean, also nested classes are not yet supported in java. On Tue, Nov 11, 2014 at 10:11 AM, Xiangrui Meng men...@gmail.com wrote: I think you need a Java bean class instead of a normal class. See example here: http://spark.apache.org/docs/1.1.0/sql-programming-guide.html (switch to the java tab). -Xiangrui On Tue, Nov 11, 2014 at 7:18 AM, Naveen Kumar Pokala npok...@spcapitaliq.com wrote: Hi, This is my Instrument java constructor. public Instrument(Issue issue, Issuer issuer, Issuing issuing) { super(); this.issue = issue; this.issuer = issuer; this.issuing = issuing; } I am trying to create javaschemaRDD JavaSchemaRDD schemaInstruments = sqlCtx.applySchema(distData, Instrument.class); Remarks: Instrument, Issue, Issuer, Issuing all are java classes distData is holding List Instrument I am getting the following error. Exception in thread Driver java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) Caused by: scala.MatchError: class sample.spark.test.Issue (of class java.lang.Class) at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:189) at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:188) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.sql.api.java.JavaSQLContext.getSchema(JavaSQLContext.scala:188) at org.apache.spark.sql.api.java.JavaSQLContext.applySchema(JavaSQLContext.scala:90) at sample.spark.test.SparkJob.main(SparkJob.java:33) ... 5 more Please help me. Regards, Naveen. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: scala.MatchError: class java.sql.Timestamp
Can you provide the exception stack? Thanks, Daoyuan From: Ge, Yao (Y.) [mailto:y...@ford.com] Sent: Sunday, October 19, 2014 10:17 PM To: user@spark.apache.org Subject: scala.MatchError: class java.sql.Timestamp I am working with Spark 1.1.0 and I believe Timestamp is a supported data type for Spark SQL. However I keep getting this MatchError for java.sql.Timestamp when I try to use reflection to register a Java Bean with Timestamp field. Anything wrong with my code below? public static class Event implements Serializable { private String name; private Timestamp time; public String getName() { return name; } public void setName(String name) { this.name = name; } public Timestamp getTime() { return time; } public void setTime(Timestamp time) { this.time = time; } } @Test public void testTimeStamp() { JavaSparkContext sc = new JavaSparkContext(local, timestamp); String[] data = {1,2014-01-01, 2,2014-02-01}; JavaRDDString input = sc.parallelize(Arrays.asList(data)); JavaRDDEvent events = input.map(new FunctionString,Event() { public Event call(String arg0) throws Exception { String[] c = arg0.split(,); Event e = new Event(); e.setName(c[0]); DateFormat fmt = new SimpleDateFormat(-MM-dd); e.setTime(new Timestamp(fmt.parse(c[1]).getTime())); return e; } }); JavaSQLContext sqlCtx = new JavaSQLContext(sc); JavaSchemaRDD schemaEvent = sqlCtx.applySchema(events, Event.class); schemaEvent.registerTempTable(event); sc.stop(); }
RE: scala.MatchError: class java.sql.Timestamp
scala.MatchError: class java.sql.Timestamp (of class java.lang.Class) at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:189) at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:188) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.sql.api.java.JavaSQLContext.getSchema(JavaSQLContext.scala:188) at org.apache.spark.sql.api.java.JavaSQLContext.applySchema(JavaSQLContext.scala:90) at com.ford.dtc.ff.SessionStats.testTimeStamp(SessionStats.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) From: Wang, Daoyuan [mailto:daoyuan.w...@intel.com] Sent: Sunday, October 19, 2014 10:31 AM To: Ge, Yao (Y.); user@spark.apache.org Subject: RE: scala.MatchError: class java.sql.Timestamp Can you provide the exception stack? Thanks, Daoyuan From: Ge, Yao (Y.) [mailto:y...@ford.com] Sent: Sunday, October 19, 2014 10:17 PM To: user@spark.apache.orgmailto:user@spark.apache.org Subject: scala.MatchError: class java.sql.Timestamp I am working with Spark 1.1.0 and I believe Timestamp is a supported data type for Spark SQL. However I keep getting this MatchError for java.sql.Timestamp when I try to use reflection to register a Java Bean with Timestamp field. Anything wrong with my code below? public static class Event implements Serializable { private String name; private Timestamp time; public String getName() { return name; } public void setName(String name) { this.name = name; } public Timestamp getTime() { return time; } public void setTime(Timestamp time) { this.time = time; } } @Test public void testTimeStamp() { JavaSparkContext sc = new
RE: scala.MatchError: class java.sql.Timestamp
Seems bugs in the JavaSQLContext.getSchema(), which doesn't enumerate all of the data types supported by Catalyst. From: Ge, Yao (Y.) [mailto:y...@ford.com] Sent: Sunday, October 19, 2014 11:44 PM To: Wang, Daoyuan; user@spark.apache.org Subject: RE: scala.MatchError: class java.sql.Timestamp scala.MatchError: class java.sql.Timestamp (of class java.lang.Class) at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:189) at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:188) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.sql.api.java.JavaSQLContext.getSchema(JavaSQLContext.scala:188) at org.apache.spark.sql.api.java.JavaSQLContext.applySchema(JavaSQLContext.scala:90) at com.ford.dtc.ff.SessionStats.testTimeStamp(SessionStats.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) From: Wang, Daoyuan [mailto:daoyuan.w...@intel.com] Sent: Sunday, October 19, 2014 10:31 AM To: Ge, Yao (Y.); user@spark.apache.orgmailto:user@spark.apache.org Subject: RE: scala.MatchError: class java.sql.Timestamp Can you provide the exception stack? Thanks, Daoyuan From: Ge, Yao (Y.) [mailto:y...@ford.com] Sent: Sunday, October 19, 2014 10:17 PM To: user@spark.apache.orgmailto:user@spark.apache.org Subject: scala.MatchError: class java.sql.Timestamp I am working with Spark 1.1.0 and I believe Timestamp is a supported data type for Spark SQL. However I keep getting this MatchError for java.sql.Timestamp when I try to use reflection to register a Java Bean with Timestamp field. Anything wrong with my code below? public static class Event implements Serializable { private String name; private Timestamp time; public String getName() { return name; } public void setName(String name) { this.name = name; } public Timestamp getTime() { return time
RE: scala.MatchError: class java.sql.Timestamp
I have created an issue for this https://issues.apache.org/jira/browse/SPARK-4003 From: Cheng, Hao Sent: Monday, October 20, 2014 9:20 AM To: Ge, Yao (Y.); Wang, Daoyuan; user@spark.apache.org Subject: RE: scala.MatchError: class java.sql.Timestamp Seems bugs in the JavaSQLContext.getSchema(), which doesn't enumerate all of the data types supported by Catalyst. From: Ge, Yao (Y.) [mailto:y...@ford.com] Sent: Sunday, October 19, 2014 11:44 PM To: Wang, Daoyuan; user@spark.apache.orgmailto:user@spark.apache.org Subject: RE: scala.MatchError: class java.sql.Timestamp scala.MatchError: class java.sql.Timestamp (of class java.lang.Class) at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:189) at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:188) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.sql.api.java.JavaSQLContext.getSchema(JavaSQLContext.scala:188) at org.apache.spark.sql.api.java.JavaSQLContext.applySchema(JavaSQLContext.scala:90) at com.ford.dtc.ff.SessionStats.testTimeStamp(SessionStats.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) From: Wang, Daoyuan [mailto:daoyuan.w...@intel.com] Sent: Sunday, October 19, 2014 10:31 AM To: Ge, Yao (Y.); user@spark.apache.orgmailto:user@spark.apache.org Subject: RE: scala.MatchError: class java.sql.Timestamp Can you provide the exception stack? Thanks, Daoyuan From: Ge, Yao (Y.) [mailto:y...@ford.com] Sent: Sunday, October 19, 2014 10:17 PM To: user@spark.apache.orgmailto:user@spark.apache.org Subject: scala.MatchError: class java.sql.Timestamp I am working with Spark 1.1.0 and I believe Timestamp is a supported data type for Spark SQL. However I keep getting this MatchError for java.sql.Timestamp when I try to use reflection to register a Java Bean with Timestamp field. Anything wrong with my code below? public static class Event implements Serializable { private String name; private Timestamp time; public String getName() { return name; } public void