When you use match, the match must be exhaustive. That is, a match error is 
thrown if the match fails. 


That's why you usually handle the default case using "case _ => ..."




Here it looks like your taking the text of all statuses - which means not all 
of them will be commands... Which means your match will not be exhaustive.




The solution is either to add a default case which does nothing, or probably 
better to add a .filter such that you filter out anything that's not a command 
before matching.




Just looking at it again it could also be that you take x => x._2._1 ... What 
type is that? Should it not be a Seq if you're joining, in which case the match 
will also fail...




Hope this helps.
—
Sent from Mailbox

On Sun, Jun 8, 2014 at 6:45 PM, Jeremy Lee <unorthodox.engine...@gmail.com>
wrote:

> I shut down my first (working) cluster and brought up a fresh one... and
> It's been a bit of a horror and I need to sleep now. Should I be worried
> about these errors? Or did I just have the old log4j.config tuned so I
> didn't see them?
> I
> 14/06/08 16:32:52 ERROR scheduler.JobScheduler: Error running job streaming
> job 1402245172000 ms.2
> scala.MatchError: 0101-01-10 (of class java.lang.String)
>         at SimpleApp$$anonfun$6$$anonfun$apply$6.apply(SimpleApp.scala:218)
>         at SimpleApp$$anonfun$6$$anonfun$apply$6.apply(SimpleApp.scala:217)
>         at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>         at
> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>         at SimpleApp$$anonfun$6.apply(SimpleApp.scala:217)
>         at SimpleApp$$anonfun$6.apply(SimpleApp.scala:214)
>         at
> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
>         at
> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>         at scala.util.Try$.apply(Try.scala:161)
>         at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> The error comes from this code, which seemed like a sensible way to match
> things:
> (The "case cmd_plus(w)" statement is generating the error,)
> val cmd_plus = """[+]([\w]+)""".r
> val cmd_minus = """[-]([\w]+)""".r
> // find command user tweets
> val commands = stream.map(
> status => ( status.getUser().getId(), status.getText() )
> ).foreachRDD(rdd => {
> rdd.join(superusers).map(
> x => x._2._1
> ).collect().foreach{ cmd => {
> 218:  cmd match {
> case cmd_plus(w) => {
> ...
> } case cmd_minus(w) => { ... } } }} })
> It seems a bit excessive for scala to throw exceptions because a regex
> didn't match. Something feels wrong.

Reply via email to