[ https://issues.apache.org/jira/browse/SPARK-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-19581. ------------------------------- Resolution: Cannot Reproduce > running NaiveBayes model with 0 features can crash the executor with D > rorreGEMV > -------------------------------------------------------------------------------- > > Key: SPARK-19581 > URL: https://issues.apache.org/jira/browse/SPARK-19581 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.1.0 > Environment: spark development or standalone mode on windows or linux. > Reporter: Barry Becker > Priority: Minor > > The severity of this bug is high (because nothing should cause spark to crash > like this) but the priority may be low (because there is an easy workaround). > In our application, a user can select features and a target to run the > NaiveBayes inducer. If columns have too many values or all one value, they > will be removed before we call the inducer to create the model. As a result, > there are some cases, where all the features may get removed. When this > happens, executors will crash and get restarted (if on a cluster) or spark > will crash and need to be manually restarted (if in development mode). > It looks like NaiveBayes uses BLAS, and BLAS does not handle this case well > when it is encountered. I emits this vague error : > ** On entry to DGEMV parameter number 6 had an illegal value > and terminates. > My code looks like this: > {code} > val predictions = model.transform(testData) // Make predictions > // figure out how many were correctly predicted > val numCorrect = predictions.filter(new Column(actualTarget) === new > Column(PREDICTION_LABEL_COLUMN)).count() > val numIncorrect = testRowCount - numCorrect > {code} > The failure is at the line that does the count, but it is not the count that > causes the problem, it is the model.transform step (where the model contains > the NaiveBayes classifier). > Here is the stack trace (in development mode): > {code} > [2017-02-13 06:28:39,946] TRACE evidence.EvidenceVizModel$ [] > [akka://JobServer/user/context-supervisor/sql-context] - done making > predictions in 232 > ** On entry to DGEMV parameter number 6 had an illegal value > ** On entry to DGEMV parameter number 6 had an illegal value > ** On entry to DGEMV parameter number 6 had an illegal value > [2017-02-13 06:28:40,506] ERROR .scheduler.LiveListenerBus [] > [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has > already stopped! Dropping event SparkListenerSQLExecutionEnd(9,1486996120505) > [2017-02-13 06:28:40,506] ERROR .scheduler.LiveListenerBus [] > [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has > already stopped! Dropping event > SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@1f6c4a29) > [2017-02-13 06:28:40,508] ERROR .scheduler.LiveListenerBus [] > [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has > already stopped! Dropping event > SparkListenerJobEnd(12,1486996120507,JobFailed(org.apache.spark.SparkException: > Job 12 cancelled because SparkContext was shut down)) > [2017-02-13 06:28:40,509] ERROR .jobserver.JobManagerActor [] > [akka://JobServer/user/context-supervisor/sql-context] - Got Throwable > org.apache.spark.SparkException: Job 12 cancelled because SparkContext was > shut down > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:808) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:806) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) > at > org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:806) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1668) > at org.apache.spark.util.EventLoop.stop(EventLoop.scala:83) > at > org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1587) > at > org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1826) > at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283) > at org.apache.spark.SparkContext.stop(SparkContext.scala:1825) > at > org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) > {code} > and here it is when running in standalone mode: > {code} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 7134.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 7134.0 (TID 13671, 192.168.124.23, executor 8): ExecutorLostFailure (executor > 8 exited caused by one of the running tasks) Reason: Remote RPC client > disassociated. Likely due to containers exceeding thresholds, or network > issues. Check driver logs for WARN messages. Driver > stacktrace:org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423) > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422) > > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) > scala.Option.foreach(Option.scala:257) > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802) > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650) > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605) > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594) > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) > org.apache.spark.SparkContext.runJob(SparkContext.scala:1918) > org.apache.spark.SparkContext.runJob(SparkContext.scala:1931) > org.apache.spark.SparkContext.runJob(SparkContext.scala:1944) > org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) > org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935) > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > org.apache.spark.rdd.RDD.collect(RDD.scala:934) > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275) > org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371) > > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) > org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765) > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370) > > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377) > org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2405) > org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2404) > org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2778) > org.apache.spark.sql.Dataset.count(Dataset.scala:2404) > com.mineset.spark.ml.evidence.EvidenceVizModel.getModelValidationInfo(EvidenceVizModel.scala:338) > > com.mineset.spark.ml.evidence.EvidenceVizModel.getJsonObject(EvidenceVizModel.scala:97) > > com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:129) > > com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:83) > > com.mineset.spark.common.util.CommandProcessor.process(CommandProcessor.scala:39) > > com.mineset.spark.ml.MinesetMachineLearning.processCommands(MinesetMachineLearning.scala:79) > com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:53) > com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) > spark.jobserver.SparkJobBase$class.runJob(SparkJob.scala:31) > com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) > com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) > spark.jobserver.JobManagerActor$$anonfun$getJobFuture$4.apply(JobManagerActor.scala:292) > > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org