[jira] [Commented] (SPARK-10182) GeneralizedLinearModel doesn't unpersist cached data
[ https://issues.apache.org/jira/browse/SPARK-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709416#comment-14709416 ] Vyacheslav Baranov commented on SPARK-10182: Sorry, it was wrong spark-shell. Behaviour in master is slightly different: It looks like RDDs are removed from cache on GC. I had to modify the code a bit to reproduce the issue: {code} import org.apache.spark.SparkContext import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint for (i <- 0 until 100) { val samples = Seq[LabeledPoint]( LabeledPoint(1.0, Vectors.dense(1.0, 0.0)), LabeledPoint(1.0, Vectors.dense(0.0, 1.0)), LabeledPoint(0.0, Vectors.dense(1.0, 1.0)), LabeledPoint(0.0, Vectors.dense(0.0, 0.0)) ) val rdd = sc.parallelize(samples) val model = { new LogisticRegressionWithLBFGS() .setNumClasses(2) .run(rdd) .clearThreshold() } } {code} !http://piqqin.com/img/ea6c54a1bf414828a794ca6604436d78.png! The number of cached RDDs decreases over time. However, on real-size data when building cross-validated models this is real problem: useful pre-cached datasets are dropped from memory and replaced with these {{MapPartitionsRDD}}'s. With the fix I've submitted behaviour is perfectly fine: only one RDD is cached at a time, so pre-cached data is untouched. > GeneralizedLinearModel doesn't unpersist cached data > > > Key: SPARK-10182 > URL: https://issues.apache.org/jira/browse/SPARK-10182 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.4.1 >Reporter: Vyacheslav Baranov > > The problem might be reproduced in spark-shell with following code snippet: > {code} > import org.apache.spark.SparkContext > import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS > import org.apache.spark.mllib.linalg.Vectors > import org.apache.spark.mllib.regression.LabeledPoint > val samples = Seq[LabeledPoint]( > LabeledPoint(1.0, Vectors.dense(1.0, 0.0)), > LabeledPoint(1.0, Vectors.dense(0.0, 1.0)), > LabeledPoint(0.0, Vectors.dense(1.0, 1.0)), > LabeledPoint(0.0, Vectors.dense(0.0, 0.0)) > ) > val rdd = sc.parallelize(samples) > for (i <- 0 until 10) { > val model = { > new LogisticRegressionWithLBFGS() > .setNumClasses(2) > .run(rdd) > .clearThreshold() > } > } > {code} > After code execution there are 10 {{MapPartitionsRDD}} objects on "Storage" > tab in Spark application UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10182) GeneralizedLinearModel doesn't unpersist cached data
[ https://issues.apache.org/jira/browse/SPARK-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709367#comment-14709367 ] Vyacheslav Baranov commented on SPARK-10182: Yes, please see screenshot attached: !http://piqqin.com/img/945dea5edcb132d9f7eac9969595c660.png! Actually, I have a fix for this issue & I'm preparing the pull request. > GeneralizedLinearModel doesn't unpersist cached data > > > Key: SPARK-10182 > URL: https://issues.apache.org/jira/browse/SPARK-10182 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.4.1 >Reporter: Vyacheslav Baranov > > The problem might be reproduced in spark-shell with following code snippet: > {code} > import org.apache.spark.SparkContext > import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS > import org.apache.spark.mllib.linalg.Vectors > import org.apache.spark.mllib.regression.LabeledPoint > val samples = Seq[LabeledPoint]( > LabeledPoint(1.0, Vectors.dense(1.0, 0.0)), > LabeledPoint(1.0, Vectors.dense(0.0, 1.0)), > LabeledPoint(0.0, Vectors.dense(1.0, 1.0)), > LabeledPoint(0.0, Vectors.dense(0.0, 0.0)) > ) > val rdd = sc.parallelize(samples) > for (i <- 0 until 10) { > val model = { > new LogisticRegressionWithLBFGS() > .setNumClasses(2) > .run(rdd) > .clearThreshold() > } > } > {code} > After code execution there are 10 {{MapPartitionsRDD}} objects on "Storage" > tab in Spark application UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10182) GeneralizedLinearModel doesn't unpersist cached data
[ https://issues.apache.org/jira/browse/SPARK-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Baranov updated SPARK-10182: --- Description: The problem might be reproduced in spark-shell with following code snippet: {code} import org.apache.spark.SparkContext import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint val samples = Seq[LabeledPoint]( LabeledPoint(1.0, Vectors.dense(1.0, 0.0)), LabeledPoint(1.0, Vectors.dense(0.0, 1.0)), LabeledPoint(0.0, Vectors.dense(1.0, 1.0)), LabeledPoint(0.0, Vectors.dense(0.0, 0.0)) ) val rdd = sc.parallelize(samples) for (i <- 0 until 10) { val model = { new LogisticRegressionWithLBFGS() .setNumClasses(2) .run(rdd) .clearThreshold() } } {code} After code execution there are 10 {{MapPartitionsRDD}} objects on "Storage" tab in Spark application UI. was: The problem might be reproduced in spark-shell with following code snippet: {code} import org.apache.spark.SparkContext import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint val samples = Seq[LabeledPoint]( LabeledPoint(1.0, Vectors.dense(1.0, 0.0)), LabeledPoint(1.0, Vectors.dense(0.0, 1.0)), LabeledPoint(0.0, Vectors.dense(1.0, 1.0)), LabeledPoint(0.0, Vectors.dense(0.0, 0.0)) ) val rdd = sc.parallelize(samples) for (i <- 0 until 10) { val model = { new LogisticRegressionWithLBFGS() .setNumClasses(2) .run(rdd) .clearThreshold() } } {code} After code execution there are 10 {{MapPartitionsRDD}} objects. > GeneralizedLinearModel doesn't unpersist cached data > > > Key: SPARK-10182 > URL: https://issues.apache.org/jira/browse/SPARK-10182 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.4.1 >Reporter: Vyacheslav Baranov > > The problem might be reproduced in spark-shell with following code snippet: > {code} > import org.apache.spark.SparkContext > import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS > import org.apache.spark.mllib.linalg.Vectors > import org.apache.spark.mllib.regression.LabeledPoint > val samples = Seq[LabeledPoint]( > LabeledPoint(1.0, Vectors.dense(1.0, 0.0)), > LabeledPoint(1.0, Vectors.dense(0.0, 1.0)), > LabeledPoint(0.0, Vectors.dense(1.0, 1.0)), > LabeledPoint(0.0, Vectors.dense(0.0, 0.0)) > ) > val rdd = sc.parallelize(samples) > for (i <- 0 until 10) { > val model = { > new LogisticRegressionWithLBFGS() > .setNumClasses(2) > .run(rdd) > .clearThreshold() > } > } > {code} > After code execution there are 10 {{MapPartitionsRDD}} objects on "Storage" > tab in Spark application UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10182) GeneralizedLinearModel doesn't unpersist cached data
Vyacheslav Baranov created SPARK-10182: -- Summary: GeneralizedLinearModel doesn't unpersist cached data Key: SPARK-10182 URL: https://issues.apache.org/jira/browse/SPARK-10182 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.4.1 Reporter: Vyacheslav Baranov The problem might be reproduced in spark-shell with following code snippet: {code} import org.apache.spark.SparkContext import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint val samples = Seq[LabeledPoint]( LabeledPoint(1.0, Vectors.dense(1.0, 0.0)), LabeledPoint(1.0, Vectors.dense(0.0, 1.0)), LabeledPoint(0.0, Vectors.dense(1.0, 1.0)), LabeledPoint(0.0, Vectors.dense(0.0, 0.0)) ) val rdd = sc.parallelize(samples) for (i <- 0 until 10) { val model = { new LogisticRegressionWithLBFGS() .setNumClasses(2) .run(rdd) .clearThreshold() } } {code} After code execution there are 10 {{MapPartitionsRDD}} objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8309) OpenHashMap doesn't work with more than 12M items
[ https://issues.apache.org/jira/browse/SPARK-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Baranov updated SPARK-8309: -- Description: The problem might be demonstrated with the following testcase: {code} test("support for more than 12M items") { val cnt = 1200 // 12M val map = new OpenHashMap[Int, Int](cnt) for (i <- 0 until cnt) { map(i) = 1 } val numInvalidValues = map.iterator.count(_._2 == 0) assertResult(0)(numInvalidValues) } {code} was: The problem might be demonstrated with the following testcase: {code:scala} test("support for more than 12M items") { val cnt = 1200 // 12M val map = new OpenHashMap[Int, Int](cnt) for (i <- 0 until cnt) { map(i) = 1 } val numInvalidValues = map.iterator.count(_._2 == 0) assertResult(0)(numInvalidValues) } {code} > OpenHashMap doesn't work with more than 12M items > - > > Key: SPARK-8309 > URL: https://issues.apache.org/jira/browse/SPARK-8309 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Vyacheslav Baranov > > The problem might be demonstrated with the following testcase: > {code} > test("support for more than 12M items") { > val cnt = 1200 // 12M > val map = new OpenHashMap[Int, Int](cnt) > for (i <- 0 until cnt) { > map(i) = 1 > } > val numInvalidValues = map.iterator.count(_._2 == 0) > assertResult(0)(numInvalidValues) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8309) OpenHashMap doesn't work with more than 12M items
[ https://issues.apache.org/jira/browse/SPARK-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582019#comment-14582019 ] Vyacheslav Baranov edited comment on SPARK-8309 at 6/11/15 2:59 PM: The problem occurs because of incorrect {{POSITION_MASK}} in OpenHashSet. Its value is {{0xEFF}}, but it should be {{0x1FFF}} (2 ^ 29 - 1). I have a fix for this issue and will submit pull request soon. was (Author: wildfire): The problem occurs because of incorrect {{POSITION_MASK}} in OpenHashSet. Its value is {{0xEFF}}, but it should be {{0x1FFF}} (2 ^ 29 - 1) > OpenHashMap doesn't work with more than 12M items > - > > Key: SPARK-8309 > URL: https://issues.apache.org/jira/browse/SPARK-8309 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Vyacheslav Baranov > > The problem might be demonstrated with the following testcase: > {code:scala} > test("support for more than 12M items") { > val cnt = 1200 // 12M > val map = new OpenHashMap[Int, Int](cnt) > for (i <- 0 until cnt) { > map(i) = 1 > } > val numInvalidValues = map.iterator.count(_._2 == 0) > assertResult(0)(numInvalidValues) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8309) OpenHashMap doesn't work with more than 12M items
[ https://issues.apache.org/jira/browse/SPARK-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582019#comment-14582019 ] Vyacheslav Baranov commented on SPARK-8309: --- The problem occurs because of incorrect {{POSITION_MASK}} in OpenHashSet. Its value is {{0xEFF}}, but it should be {{0x1FFF}} > OpenHashMap doesn't work with more than 12M items > - > > Key: SPARK-8309 > URL: https://issues.apache.org/jira/browse/SPARK-8309 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Vyacheslav Baranov > > The problem might be demonstrated with the following testcase: > {code:scala} > test("support for more than 12M items") { > val cnt = 1200 // 12M > val map = new OpenHashMap[Int, Int](cnt) > for (i <- 0 until cnt) { > map(i) = 1 > } > val numInvalidValues = map.iterator.count(_._2 == 0) > assertResult(0)(numInvalidValues) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8309) OpenHashMap doesn't work with more than 12M items
[ https://issues.apache.org/jira/browse/SPARK-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582019#comment-14582019 ] Vyacheslav Baranov edited comment on SPARK-8309 at 6/11/15 2:55 PM: The problem occurs because of incorrect {{POSITION_MASK}} in OpenHashSet. Its value is {{0xEFF}}, but it should be {{0x1FFF}} (2 ^ 29 - 1) was (Author: wildfire): The problem occurs because of incorrect {{POSITION_MASK}} in OpenHashSet. Its value is {{0xEFF}}, but it should be {{0x1FFF}} > OpenHashMap doesn't work with more than 12M items > - > > Key: SPARK-8309 > URL: https://issues.apache.org/jira/browse/SPARK-8309 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Vyacheslav Baranov > > The problem might be demonstrated with the following testcase: > {code:scala} > test("support for more than 12M items") { > val cnt = 1200 // 12M > val map = new OpenHashMap[Int, Int](cnt) > for (i <- 0 until cnt) { > map(i) = 1 > } > val numInvalidValues = map.iterator.count(_._2 == 0) > assertResult(0)(numInvalidValues) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8309) OpenHashMap doesn't work with more than 12M items
Vyacheslav Baranov created SPARK-8309: - Summary: OpenHashMap doesn't work with more than 12M items Key: SPARK-8309 URL: https://issues.apache.org/jira/browse/SPARK-8309 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.0 Reporter: Vyacheslav Baranov The problem might be demonstrated with the following testcase: {code:scala} test("support for more than 12M items") { val cnt = 1200 // 12M val map = new OpenHashMap[Int, Int](cnt) for (i <- 0 until cnt) { map(i) = 1 } val numInvalidValues = map.iterator.count(_._2 == 0) assertResult(0)(numInvalidValues) } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7364) NPE when reading null DATE columns from JDBC
[ https://issues.apache.org/jira/browse/SPARK-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Baranov resolved SPARK-7364. --- Resolution: Duplicate > NPE when reading null DATE columns from JDBC > > > Key: SPARK-7364 > URL: https://issues.apache.org/jira/browse/SPARK-7364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Vyacheslav Baranov > > NullPointerException occurs when attempting to read DATE column with NULL > from JDBC data source > {noformat} > java.lang.NullPointerException > at > org.apache.spark.sql.types.DateUtils$.javaDateToDays(DateUtils.scala:40) > at > org.apache.spark.sql.types.DateUtils$.fromJavaDate(DateUtils.scala:54) > at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.getNext(JDBCRDD.scala:367) > at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.hasNext(JDBCRDD.scala:428) > at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$14.apply(RDD.scala:869) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$14.apply(RDD.scala:869) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1679) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1679) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7364) NPE when reading null DATE columns from JDBC
Vyacheslav Baranov created SPARK-7364: - Summary: NPE when reading null DATE columns from JDBC Key: SPARK-7364 URL: https://issues.apache.org/jira/browse/SPARK-7364 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Vyacheslav Baranov NullPointerException occurs when attempting to read DATE column with NULL from JDBC data source {noformat} java.lang.NullPointerException at org.apache.spark.sql.types.DateUtils$.javaDateToDays(DateUtils.scala:40) at org.apache.spark.sql.types.DateUtils$.fromJavaDate(DateUtils.scala:54) at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.getNext(JDBCRDD.scala:367) at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.hasNext(JDBCRDD.scala:428) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$14.apply(RDD.scala:869) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$14.apply(RDD.scala:869) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1679) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1679) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6913) "No suitable driver found" loading JDBC dataframe using driver added by through SparkContext.addJar
[ https://issues.apache.org/jira/browse/SPARK-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519124#comment-14519124 ] Vyacheslav Baranov commented on SPARK-6913: --- The problem is in java.sql.DriverManager that doesn't see the drivers loaded by ClassLoaders other than bootstrap ClassLoader. The solution would be to create a proxy driver included in Spark assembly that forwards all requests to wrapped driver. I have a working fix for this issue and going to make pull request soon. > "No suitable driver found" loading JDBC dataframe using driver added by > through SparkContext.addJar > --- > > Key: SPARK-6913 > URL: https://issues.apache.org/jira/browse/SPARK-6913 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Evan Yu > > val sc = new SparkContext(conf) > sc.addJar("J:\mysql-connector-java-5.1.35.jar") > val df = > sqlContext.jdbc("jdbc:mysql://localhost:3000/test_db?user=abc&password=123", > "table1") > df.show() > Folloing error: > 2015-04-14 17:04:39,541 [task-result-getter-0] WARN > org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 0.0 (TID > 0, dev1.test.dc2.com): java.sql.SQLException: No suitable driver found for > jdbc:mysql://localhost:3000/test_db?user=abc&password=123 > at java.sql.DriverManager.getConnection(DriverManager.java:689) > at java.sql.DriverManager.getConnection(DriverManager.java:270) > at > org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:158) > at > org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:150) > at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:317) > at org.apache.spark.sql.jdbc.JDBCRDD.compute(JDBCRDD.scala:309) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org