[jira] [Commented] (SPARK-19581) running NaiveBayes model with 0 features can crash the executor with D rorreGEMV

颜发才 Thu, 18 May 2017 00:35:22 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16015345#comment-16015345
 ]


Yan Facai (颜发才) commented on SPARK-19581:
-----------------------------------------

[~barrybecker4] Hi, Becker.
I can't reproduce the bug on spark-2.1.1-bin-hadoop2.7.

1) For 0 size of feature, the exception is harmless.

```scala
  val data = 
spark.read.format("libsvm").load("/user/facai/data/libsvm/sample_libsvm_data.txt").cache
  import org.apache.spark.ml.classification.NaiveBayes
  val model = new NaiveBayes().fit(data)
  import org.apache.spark.ml.linalg.{Vectors => SV}
  case class TestData(features: org.apache.spark.ml.linalg.Vector)
  val emptyVector = SV.sparse(0, Array.empty[Int], Array.empty[Double])
  val test = Seq(TestData(emptyVector)).toDF
scala>  test.show
+---------+
| features|
+---------+
|(0,[],[])|
+---------+

scala> model.transform(test).show
org.apache.spark.SparkException: Failed to execute user defined 
function($anonfun$1: (vector) => vector)
  at 
org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1072)
  ... 48 elided
Caused by: java.lang.IllegalArgumentException: requirement failed: The columns 
of A don't match the number of elements of x. A: 692, x: 0
  at scala.Predef$.require(Predef.scala:224)
  ... 99 more
```

2) For 692 size of empty feature, it's OK.

```scala
scala> val emptyVector = SV.sparse(692, Array.empty[Int], Array.empty[Double])
emptyVector: org.apache.spark.ml.linalg.Vector = (692,[],[])

scala> val test = Seq(TestData(emptyVector)).toDF
test: org.apache.spark.sql.DataFrame = [features: vector]

scala> test.show
+-----------+
|   features|
+-----------+
|(692,[],[])|
+-----------+

scala> model.transform(test).show
+-----------+--------------------+--------------------+----------+
|   features|       rawPrediction|         probability|prediction|
+-----------+--------------------+--------------------+----------+
|(692,[],[])|[-0.8407831793660...|[0.43137254901960...|       1.0|
+-----------+--------------------+--------------------+----------+

```

> running NaiveBayes model with 0 features can crash the executor with D 
> rorreGEMV
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-19581
>                 URL: https://issues.apache.org/jira/browse/SPARK-19581
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.1.0
>         Environment: spark development or standalone mode on windows or linux.
>            Reporter: Barry Becker
>            Priority: Minor
>
> The severity of this bug is high (because nothing should cause spark to crash 
> like this) but the priority may be low (because there is an easy workaround).
> In our application, a user can select features and a target to run the 
> NaiveBayes inducer. If columns have too many values or all one value, they 
> will be removed before we call the inducer to create the model. As a result, 
> there are some cases, where all the features may get removed. When this 
> happens, executors will crash and get restarted (if on a cluster) or spark 
> will crash and need to be manually restarted (if in development mode).
> It looks like NaiveBayes uses BLAS, and BLAS does not handle this case well 
> when it is encountered. I emits this vague error :
> ** On entry to DGEMV  parameter number  6 had an illegal value
> and terminates.
> My code looks like this:
> {code}
>    val predictions = model.transform(testData)  // Make predictions
>     // figure out how many were correctly predicted
>     val numCorrect = predictions.filter(new Column(actualTarget) === new 
> Column(PREDICTION_LABEL_COLUMN)).count()
>     val numIncorrect = testRowCount - numCorrect
> {code}
> The failure is at the line that does the count, but it is not the count that 
> causes the problem, it is the model.transform step (where the model contains 
> the NaiveBayes classifier).
> Here is the stack trace (in development mode):
> {code}
> [2017-02-13 06:28:39,946] TRACE evidence.EvidenceVizModel$ [] 
> [akka://JobServer/user/context-supervisor/sql-context] -      done making 
> predictions in 232
>  ** On entry to DGEMV  parameter number  6 had an illegal value
>  ** On entry to DGEMV  parameter number  6 had an illegal value
>  ** On entry to DGEMV  parameter number  6 had an illegal value
> [2017-02-13 06:28:40,506] ERROR .scheduler.LiveListenerBus [] 
> [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has 
> already stopped! Dropping event SparkListenerSQLExecutionEnd(9,1486996120505)
> [2017-02-13 06:28:40,506] ERROR .scheduler.LiveListenerBus [] 
> [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@1f6c4a29)
> [2017-02-13 06:28:40,508] ERROR .scheduler.LiveListenerBus [] 
> [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerJobEnd(12,1486996120507,JobFailed(org.apache.spark.SparkException:
>  Job 12 cancelled because SparkContext was shut down))
> [2017-02-13 06:28:40,509] ERROR .jobserver.JobManagerActor [] 
> [akka://JobServer/user/context-supervisor/sql-context] - Got Throwable
> org.apache.spark.SparkException: Job 12 cancelled because SparkContext was 
> shut down
>         at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:808)
>         at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:806)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>         at 
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:806)
>         at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1668)
>         at org.apache.spark.util.EventLoop.stop(EventLoop.scala:83)
>         at 
> org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1587)
>         at 
> org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1826)
>         at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
>         at org.apache.spark.SparkContext.stop(SparkContext.scala:1825)
>         at 
> org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
> {code}
> and here it is when running in standalone mode:
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 7134.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
> 7134.0 (TID 13671, 192.168.124.23, executor 8): ExecutorLostFailure (executor 
> 8 exited caused by one of the running tasks) Reason: Remote RPC client 
> disassociated. Likely due to containers exceeding thresholds, or network 
> issues. Check driver logs for WARN messages. Driver 
> stacktrace:org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
>  
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
>  
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
>  
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>  
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>  scala.Option.foreach(Option.scala:257) 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
>  
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
>  
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
>  
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
>  org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1918) 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1931) 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1944) 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) 
> org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935) 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>  org.apache.spark.rdd.RDD.withScope(RDD.scala:362) 
> org.apache.spark.rdd.RDD.collect(RDD.scala:934) 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275) 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371)
>  
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>  org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765) 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370)
>  
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377)
>  org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2405) 
> org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2404) 
> org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2778) 
> org.apache.spark.sql.Dataset.count(Dataset.scala:2404) 
> com.mineset.spark.ml.evidence.EvidenceVizModel.getModelValidationInfo(EvidenceVizModel.scala:338)
>  
> com.mineset.spark.ml.evidence.EvidenceVizModel.getJsonObject(EvidenceVizModel.scala:97)
>  
> com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:129)
>  
> com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:83)
>  
> com.mineset.spark.common.util.CommandProcessor.process(CommandProcessor.scala:39)
>  
> com.mineset.spark.ml.MinesetMachineLearning.processCommands(MinesetMachineLearning.scala:79)
>  com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:53) 
> com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) 
> spark.jobserver.SparkJobBase$class.runJob(SparkJob.scala:31) 
> com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) 
> com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) 
> spark.jobserver.JobManagerActor$$anonfun$getJobFuture$4.apply(JobManagerActor.scala:292)
>  
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>  scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19581) running NaiveBayes model with 0 features can crash the executor with D rorreGEMV

Reply via email to