[jira] [Resolved] (SPARK-19581) running NaiveBayes model with 0 features can crash the executor with D rorreGEMV

Sean Owen (JIRA) Fri, 26 May 2017 01:53:51 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-19581.
-------------------------------
    Resolution: Cannot Reproduce

> running NaiveBayes model with 0 features can crash the executor with D 
> rorreGEMV
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-19581
>                 URL: https://issues.apache.org/jira/browse/SPARK-19581
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.1.0
>         Environment: spark development or standalone mode on windows or linux.
>            Reporter: Barry Becker
>            Priority: Minor
>
> The severity of this bug is high (because nothing should cause spark to crash 
> like this) but the priority may be low (because there is an easy workaround).
> In our application, a user can select features and a target to run the 
> NaiveBayes inducer. If columns have too many values or all one value, they 
> will be removed before we call the inducer to create the model. As a result, 
> there are some cases, where all the features may get removed. When this 
> happens, executors will crash and get restarted (if on a cluster) or spark 
> will crash and need to be manually restarted (if in development mode).
> It looks like NaiveBayes uses BLAS, and BLAS does not handle this case well 
> when it is encountered. I emits this vague error :
> ** On entry to DGEMV  parameter number  6 had an illegal value
> and terminates.
> My code looks like this:
> {code}
>    val predictions = model.transform(testData)  // Make predictions
>     // figure out how many were correctly predicted
>     val numCorrect = predictions.filter(new Column(actualTarget) === new 
> Column(PREDICTION_LABEL_COLUMN)).count()
>     val numIncorrect = testRowCount - numCorrect
> {code}
> The failure is at the line that does the count, but it is not the count that 
> causes the problem, it is the model.transform step (where the model contains 
> the NaiveBayes classifier).
> Here is the stack trace (in development mode):
> {code}
> [2017-02-13 06:28:39,946] TRACE evidence.EvidenceVizModel$ [] 
> [akka://JobServer/user/context-supervisor/sql-context] -      done making 
> predictions in 232
>  ** On entry to DGEMV  parameter number  6 had an illegal value
>  ** On entry to DGEMV  parameter number  6 had an illegal value
>  ** On entry to DGEMV  parameter number  6 had an illegal value
> [2017-02-13 06:28:40,506] ERROR .scheduler.LiveListenerBus [] 
> [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has 
> already stopped! Dropping event SparkListenerSQLExecutionEnd(9,1486996120505)
> [2017-02-13 06:28:40,506] ERROR .scheduler.LiveListenerBus [] 
> [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@1f6c4a29)
> [2017-02-13 06:28:40,508] ERROR .scheduler.LiveListenerBus [] 
> [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerJobEnd(12,1486996120507,JobFailed(org.apache.spark.SparkException:
>  Job 12 cancelled because SparkContext was shut down))
> [2017-02-13 06:28:40,509] ERROR .jobserver.JobManagerActor [] 
> [akka://JobServer/user/context-supervisor/sql-context] - Got Throwable
> org.apache.spark.SparkException: Job 12 cancelled because SparkContext was 
> shut down
>         at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:808)
>         at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:806)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>         at 
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:806)
>         at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1668)
>         at org.apache.spark.util.EventLoop.stop(EventLoop.scala:83)
>         at 
> org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1587)
>         at 
> org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1826)
>         at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
>         at org.apache.spark.SparkContext.stop(SparkContext.scala:1825)
>         at 
> org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
> {code}
> and here it is when running in standalone mode:
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 7134.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
> 7134.0 (TID 13671, 192.168.124.23, executor 8): ExecutorLostFailure (executor 
> 8 exited caused by one of the running tasks) Reason: Remote RPC client 
> disassociated. Likely due to containers exceeding thresholds, or network 
> issues. Check driver logs for WARN messages. Driver 
> stacktrace:org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
>  
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
>  
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
>  
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>  
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>  scala.Option.foreach(Option.scala:257) 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
>  
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
>  
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
>  
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
>  org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1918) 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1931) 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1944) 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) 
> org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935) 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>  org.apache.spark.rdd.RDD.withScope(RDD.scala:362) 
> org.apache.spark.rdd.RDD.collect(RDD.scala:934) 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275) 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371)
>  
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>  org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765) 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370)
>  
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377)
>  org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2405) 
> org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2404) 
> org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2778) 
> org.apache.spark.sql.Dataset.count(Dataset.scala:2404) 
> com.mineset.spark.ml.evidence.EvidenceVizModel.getModelValidationInfo(EvidenceVizModel.scala:338)
>  
> com.mineset.spark.ml.evidence.EvidenceVizModel.getJsonObject(EvidenceVizModel.scala:97)
>  
> com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:129)
>  
> com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:83)
>  
> com.mineset.spark.common.util.CommandProcessor.process(CommandProcessor.scala:39)
>  
> com.mineset.spark.ml.MinesetMachineLearning.processCommands(MinesetMachineLearning.scala:79)
>  com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:53) 
> com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) 
> spark.jobserver.SparkJobBase$class.runJob(SparkJob.scala:31) 
> com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) 
> com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) 
> spark.jobserver.JobManagerActor$$anonfun$getJobFuture$4.apply(JobManagerActor.scala:292)
>  
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>  scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-19581) running NaiveBayes model with 0 features can crash the executor with D rorreGEMV

Reply via email to