[jira] [Commented] (SPARK-1547) Add gradient boosting algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049659#comment-14049659 ] Hector Yee commented on SPARK-1547: --- Just generic log loss with L1 regularization should suffice. Most of the work is in feature engineering anyway. It is no hurry at all, I already have several implementations not in MLLib that I am using. It would just be convenient to have another implementation to compare against. Add gradient boosting algorithm to MLlib Key: SPARK-1547 URL: https://issues.apache.org/jira/browse/SPARK-1547 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.0.0 Reporter: Manish Amde Assignee: Manish Amde This task requires adding the gradient boosting algorithm to Spark MLlib. The implementation needs to adapt the gradient boosting algorithm to the scalable tree implementation. The tasks involves: - Comparing the various tradeoffs and finalizing the algorithm before implementation - Code implementation - Unit tests - Functional tests - Performance tests - Documentation -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-1525) TaskSchedulerImpl should decrease availableCpus by spark.task.cpus not 1
[ https://issues.apache.org/jira/browse/SPARK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YanTang Zhai closed SPARK-1525. --- Resolution: Fixed TaskSchedulerImpl should decrease availableCpus by spark.task.cpus not 1 Key: SPARK-1525 URL: https://issues.apache.org/jira/browse/SPARK-1525 Project: Spark Issue Type: Bug Components: Spark Core Reporter: YanTang Zhai Priority: Minor TaskSchedulerImpl decreases availableCpus by 1 in resourceOffers process always even though spark.task.cpus is more than 1, which will schedule more tasks to some node when spark.task.cpus is more than 1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-786) Clean up old work directories in standalone worker
[ https://issues.apache.org/jira/browse/SPARK-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049671#comment-14049671 ] anurag tangri commented on SPARK-786: - Hi, We are also facing this issue. Could somebody assign this ticket to me ? I would like to work on this. Thanks, Anurag Tangri Clean up old work directories in standalone worker -- Key: SPARK-786 URL: https://issues.apache.org/jira/browse/SPARK-786 Project: Spark Issue Type: New Feature Components: Deploy Affects Versions: 0.7.2 Reporter: Matei Zaharia We should add a setting to clean old work directories after X days. Otherwise, the directory gets filled forever with shuffle files and logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2341) loadLibSVMFile doesn't handle regression datasets
Eustache created SPARK-2341: --- Summary: loadLibSVMFile doesn't handle regression datasets Key: SPARK-2341 URL: https://issues.apache.org/jira/browse/SPARK-2341 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Reporter: Eustache Priority: Minor Many datasets exist in LibSVM format for regression tasks [1] but currently the loadLibSVMFile primitive doesn't handle regression datasets. More precisely, the LabelParser is either a MulticlassLabelParser or a BinaryLabelParser. What happens then is that the file is loaded but in multiclass mode : each target value is interpreted as a class name ! The fix would be to write a RegressionLabelParser which converts target values to Double and plug it into the loadLibSVMFile routine. [1] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2339) SQL parser in sql-core is case sensitive, but a table alias is converted to lower case when we create Subquery
[ https://issues.apache.org/jira/browse/SPARK-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2339: Fix Version/s: 1.1.0 SQL parser in sql-core is case sensitive, but a table alias is converted to lower case when we create Subquery -- Key: SPARK-2339 URL: https://issues.apache.org/jira/browse/SPARK-2339 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Yin Huai Fix For: 1.1.0 Reported by http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Join-throws-exception-td8599.html After we get the table from the catalog, because the table has an alias, we will temporarily insert a Subquery. Then, we convert the table alias to lower case no matter if the parser is case sensitive or not. To see the issue ... {code} val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD case class Person(name: String, age: Int) val people = sc.textFile(examples/src/main/resources/people.txt).map(_.split(,)).map(p = Person(p(0), p(1).trim.toInt)) people.registerAsTable(people) sqlContext.sql(select PEOPLE.name from people PEOPLE) {code} The plan is ... {code} == Query Plan == Project ['PEOPLE.name] ExistingRdd [name#0,age#1], MapPartitionsRDD[4] at mapPartitions at basicOperators.scala:176 {code} You can find that PEOPLE.name is not resolved. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2342) Evaluation helper's output type doesn't conform to input type
Yijie Shen created SPARK-2342: - Summary: Evaluation helper's output type doesn't conform to input type Key: SPARK-2342 URL: https://issues.apache.org/jira/browse/SPARK-2342 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Yijie Shen Priority: Minor In sql/catalyst/org/apache/spark/sql/catalyst/expressions.scala protected final def n2 ( i: Row, e1: Expression, e2: Expression, f: ((Numeric[Any], Any, Any) = Any)): Any is intended to do computations for Numeric add/Minus/Multipy. Just as the comment suggest : Those expressions are supposed to be in the same data type, and also the return type. But in code, function f was casted to function signature: (Numeric[n.JvmType], n.JvmType, n.JvmType) = Int I thought it as a typo and the correct should be: (Numeric[n.JvmType], n.JvmType, n.JvmType) = n.JvmType -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2342) Evaluation helper's output type doesn't conform to input type
[ https://issues.apache.org/jira/browse/SPARK-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yijie Shen updated SPARK-2342: -- Description: In sql/catalyst/org/apache/spark/sql/catalyst/expressions.scala {code}protected final def n2 ( i: Row, e1: Expression, e2: Expression, f: ((Numeric[Any], Any, Any) = Any)): Any {code} is intended to do computations for Numeric add/Minus/Multipy. Just as the comment suggest : {quote}Those expressions are supposed to be in the same data type, and also the return type.{quote} But in code, function f was casted to function signature: {code}(Numeric[n.JvmType], n.JvmType, n.JvmType) = Int{code} I thought it as a typo and the correct should be: {code}(Numeric[n.JvmType], n.JvmType, n.JvmType) = n.JvmType{code} was: In sql/catalyst/org/apache/spark/sql/catalyst/expressions.scala protected final def n2 ( i: Row, e1: Expression, e2: Expression, f: ((Numeric[Any], Any, Any) = Any)): Any is intended to do computations for Numeric add/Minus/Multipy. Just as the comment suggest : Those expressions are supposed to be in the same data type, and also the return type. But in code, function f was casted to function signature: (Numeric[n.JvmType], n.JvmType, n.JvmType) = Int I thought it as a typo and the correct should be: (Numeric[n.JvmType], n.JvmType, n.JvmType) = n.JvmType Evaluation helper's output type doesn't conform to input type - Key: SPARK-2342 URL: https://issues.apache.org/jira/browse/SPARK-2342 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Yijie Shen Priority: Minor Labels: easyfix In sql/catalyst/org/apache/spark/sql/catalyst/expressions.scala {code}protected final def n2 ( i: Row, e1: Expression, e2: Expression, f: ((Numeric[Any], Any, Any) = Any)): Any {code} is intended to do computations for Numeric add/Minus/Multipy. Just as the comment suggest : {quote}Those expressions are supposed to be in the same data type, and also the return type.{quote} But in code, function f was casted to function signature: {code}(Numeric[n.JvmType], n.JvmType, n.JvmType) = Int{code} I thought it as a typo and the correct should be: {code}(Numeric[n.JvmType], n.JvmType, n.JvmType) = n.JvmType{code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2341) loadLibSVMFile doesn't handle regression datasets
[ https://issues.apache.org/jira/browse/SPARK-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049732#comment-14049732 ] Xiangrui Meng commented on SPARK-2341: -- Just set `multiclass = true` to load double values. loadLibSVMFile doesn't handle regression datasets - Key: SPARK-2341 URL: https://issues.apache.org/jira/browse/SPARK-2341 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Reporter: Eustache Priority: Minor Labels: easyfix Many datasets exist in LibSVM format for regression tasks [1] but currently the loadLibSVMFile primitive doesn't handle regression datasets. More precisely, the LabelParser is either a MulticlassLabelParser or a BinaryLabelParser. What happens then is that the file is loaded but in multiclass mode : each target value is interpreted as a class name ! The fix would be to write a RegressionLabelParser which converts target values to Double and plug it into the loadLibSVMFile routine. [1] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2341) loadLibSVMFile doesn't handle regression datasets
[ https://issues.apache.org/jira/browse/SPARK-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049755#comment-14049755 ] Eustache commented on SPARK-2341: - I see that LabelParser with multiclass=true works for the regression setting. What I fail to understand is how it is related to multiclass ? Is the naming proper ? In any case shouldn't we provide a naming that explicitly mentions regression ? loadLibSVMFile doesn't handle regression datasets - Key: SPARK-2341 URL: https://issues.apache.org/jira/browse/SPARK-2341 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Reporter: Eustache Priority: Minor Labels: easyfix Many datasets exist in LibSVM format for regression tasks [1] but currently the loadLibSVMFile primitive doesn't handle regression datasets. More precisely, the LabelParser is either a MulticlassLabelParser or a BinaryLabelParser. What happens then is that the file is loaded but in multiclass mode : each target value is interpreted as a class name ! The fix would be to write a RegressionLabelParser which converts target values to Double and plug it into the loadLibSVMFile routine. [1] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2341) loadLibSVMFile doesn't handle regression datasets
[ https://issues.apache.org/jira/browse/SPARK-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049765#comment-14049765 ] Xiangrui Meng commented on SPARK-2341: -- It is a little awkward to have both `regression` and `multiclass` as input arguments. I agree that a correct name should be `multiclassOrRegression`. But it is certainly too long. We tried to make this clear in the doc: {code} multiclass: whether the input labels contain more than two classes. If false, any label with value greater than 0.5 will be mapped to 1.0, or 0.0 otherwise. So it works for both +1/-1 and 1/0 cases. If true, the double value parsed directly from the label string will be used as the label value. {code} It would be good if we can improve the documentation to make it clearer. But for the API, I don't feel that it is necessary to change. loadLibSVMFile doesn't handle regression datasets - Key: SPARK-2341 URL: https://issues.apache.org/jira/browse/SPARK-2341 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Reporter: Eustache Priority: Minor Labels: easyfix Many datasets exist in LibSVM format for regression tasks [1] but currently the loadLibSVMFile primitive doesn't handle regression datasets. More precisely, the LabelParser is either a MulticlassLabelParser or a BinaryLabelParser. What happens then is that the file is loaded but in multiclass mode : each target value is interpreted as a class name ! The fix would be to write a RegressionLabelParser which converts target values to Double and plug it into the loadLibSVMFile routine. [1] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2341) loadLibSVMFile doesn't handle regression datasets
[ https://issues.apache.org/jira/browse/SPARK-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049765#comment-14049765 ] Xiangrui Meng edited comment on SPARK-2341 at 7/2/14 9:09 AM: -- It is a little awkward to have both `regression` and `multiclass` as input arguments. I agree that a correct name should be `multiclassOrRegression` or `multiclassOrContinuous`. But it is certainly too long. We tried to make this clear in the doc: {code} multiclass: whether the input labels contain more than two classes. If false, any label with value greater than 0.5 will be mapped to 1.0, or 0.0 otherwise. So it works for both +1/-1 and 1/0 cases. If true, the double value parsed directly from the label string will be used as the label value. {code} It would be good if we can improve the documentation to make it clearer. But for the API, I don't feel that it is necessary to change. was (Author: mengxr): It is a little awkward to have both `regression` and `multiclass` as input arguments. I agree that a correct name should be `multiclassOrRegression`. But it is certainly too long. We tried to make this clear in the doc: {code} multiclass: whether the input labels contain more than two classes. If false, any label with value greater than 0.5 will be mapped to 1.0, or 0.0 otherwise. So it works for both +1/-1 and 1/0 cases. If true, the double value parsed directly from the label string will be used as the label value. {code} It would be good if we can improve the documentation to make it clearer. But for the API, I don't feel that it is necessary to change. loadLibSVMFile doesn't handle regression datasets - Key: SPARK-2341 URL: https://issues.apache.org/jira/browse/SPARK-2341 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Reporter: Eustache Priority: Minor Labels: easyfix Many datasets exist in LibSVM format for regression tasks [1] but currently the loadLibSVMFile primitive doesn't handle regression datasets. More precisely, the LabelParser is either a MulticlassLabelParser or a BinaryLabelParser. What happens then is that the file is loaded but in multiclass mode : each target value is interpreted as a class name ! The fix would be to write a RegressionLabelParser which converts target values to Double and plug it into the loadLibSVMFile routine. [1] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1681) Handle hive support correctly in ./make-distribution.sh
[ https://issues.apache.org/jira/browse/SPARK-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-1681: --- Summary: Handle hive support correctly in ./make-distribution.sh (was: Handle hive support correctly in ./make-distribution) Handle hive support correctly in ./make-distribution.sh --- Key: SPARK-1681 URL: https://issues.apache.org/jira/browse/SPARK-1681 Project: Spark Issue Type: Bug Components: Build, SQL Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.0.0 When Hive support is enabled we should copy the datanucleus jars to the packaged distribution. The simplest way would be to create a lib_managed folder in the final distribution so that the compute-classpath script searches in exactly the same way whether or not it's a release. A slightly nicer solution is to put the jars inside of `/lib` and have some fancier check for the jar location in the compute-classpath script. We should also document how to run Spark SQL on YARN when hive support is enabled. In particular how to add the necessary jars to spark-submit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2306) BoundedPriorityQueue is private and not registered with Kryo
[ https://issues.apache.org/jira/browse/SPARK-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049818#comment-14049818 ] Daniel Darabos commented on SPARK-2306: --- You're the best, Ankit! Thanks! BoundedPriorityQueue is private and not registered with Kryo Key: SPARK-2306 URL: https://issues.apache.org/jira/browse/SPARK-2306 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Daniel Darabos Because BoundedPriorityQueue is private and not registered with Kryo, RDD.top cannot be used when using Kryo (the recommended configuration). Curiously BoundedPriorityQueue is registered by GraphKryoRegistrator. But that's the wrong registrator. (Is there one for Spark Core?) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1884) Shark failed to start
[ https://issues.apache.org/jira/browse/SPARK-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049877#comment-14049877 ] Pete MacKinnon commented on SPARK-1884: --- This is due to the version of protobuf-java provided by Shark being older (2.4.1) than what's needed by Hadoop 2.4 (2.5.0). See SPARK-2338. Shark failed to start - Key: SPARK-1884 URL: https://issues.apache.org/jira/browse/SPARK-1884 Project: Spark Issue Type: Bug Affects Versions: 0.9.1 Environment: ubuntu 14.04, spark 0.9.1, hive 0.13.0, hadoop 2.4.0 (stand alone), scala 2.11.0 Reporter: Wei Cui Priority: Blocker the hadoop, hive, spark works fine. when start the shark, it failed with the following messages: Starting the Shark Command Line Client 14/05/19 16:47:21 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/05/19 16:47:21 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/05/19 16:47:21 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/05/19 16:47:21 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/05/19 16:47:21 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/05/19 16:47:21 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/05/19 16:47:21 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/05/19 16:47:22 WARN conf.Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@48c724c:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/05/19 16:47:22 WARN conf.Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@48c724c:an attempt to override final parameter: mapreduce.cluster.local.dir; Ignoring. 14/05/19 16:47:22 WARN conf.Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@48c724c:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/05/19 16:47:22 WARN conf.Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@48c724c:an attempt to override final parameter: mapreduce.cluster.temp.dir; Ignoring. Logging initialized using configuration in jar:file:/usr/local/shark/lib_managed/jars/edu.berkeley.cs.shark/hive-common/hive-common-0.11.0-shark-0.9.1.jar!/hive-log4j.properties Hive history file=/tmp/root/hive_job_log_root_14857@ubuntu_201405191647_897494215.txt 6.004: [GC 279616K-18440K(1013632K), 0.0438980 secs] 6.445: [Full GC 59125K-7949K(1013632K), 0.0685160 secs] Reloading cached RDDs from previous Shark sessions... (use -skipRddReload flag to skip reloading) 7.535: [Full GC 104136K-13059K(1013632K), 0.0885820 secs] 8.459: [Full GC 61237K-18031K(1013632K), 0.0820400 secs] 8.662: [Full GC 29832K-8958K(1013632K), 0.0869700 secs] 8.751: [Full GC 13433K-8998K(1013632K), 0.0856520 secs] 10.435: [Full GC 72246K-14140K(1013632K), 0.1797530 secs] Exception in thread main org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1072) at shark.memstore2.TableRecovery$.reloadRdds(TableRecovery.scala:49) at shark.SharkCliDriver.init(SharkCliDriver.scala:283) at shark.SharkCliDriver$.main(SharkCliDriver.scala:162) at shark.SharkCliDriver.main(SharkCliDriver.scala) Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1139) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2288) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2299) at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1070) ... 4 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
[jira] [Commented] (SPARK-1850) Bad exception if multiple jars exist when running PySpark
[ https://issues.apache.org/jira/browse/SPARK-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049895#comment-14049895 ] Matthew Farrellee commented on SPARK-1850: -- [~andrewor14] - i think this should be closed as resolved in SPARK-2242 the current output for the error is, {noformat} $ ./dist/bin/pyspark Python 2.7.5 (default, Feb 19 2014, 13:47:28) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2 Type help, copyright, credits or license for more information. Traceback (most recent call last): File /home/matt/Documents/Repositories/spark/dist/python/pyspark/shell.py, line 43, in module sc = SparkContext(appName=PySparkShell, pyFiles=add_files) File /home/matt/Documents/Repositories/spark/dist/python/pyspark/context.py, line 95, in __init__ SparkContext._ensure_initialized(self, gateway=gateway) File /home/matt/Documents/Repositories/spark/dist/python/pyspark/context.py, line 191, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File /home/matt/Documents/Repositories/spark/dist/python/pyspark/java_gateway.py, line 66, in launch_gateway raise Exception(error_msg) Exception: Launching GatewayServer failed with exit code 1!(Warning: unexpected output detected.) Found multiple Spark assembly jars in /home/matt/Documents/Repositories/spark/dist/lib: spark-assembly-1.1.0-SNAPSHOT-hadoop1.0.4-.jar spark-assembly-1.1.0-SNAPSHOT-hadoop1.0.4.jar Please remove all but one jar. {noformat} Bad exception if multiple jars exist when running PySpark - Key: SPARK-1850 URL: https://issues.apache.org/jira/browse/SPARK-1850 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.0.0 Reporter: Andrew Or Fix For: 1.0.1 {code} Found multiple Spark assembly jars in /Users/andrew/Documents/dev/andrew-spark/assembly/target/scala-2.10: Traceback (most recent call last): File /Users/andrew/Documents/dev/andrew-spark/python/pyspark/shell.py, line 43, in module sc = SparkContext(os.environ.get(MASTER, local[*]), PySparkShell, pyFiles=add_files) File /Users/andrew/Documents/dev/andrew-spark/python/pyspark/context.py, line 94, in __init__ SparkContext._ensure_initialized(self, gateway=gateway) File /Users/andrew/Documents/dev/andrew-spark/python/pyspark/context.py, line 180, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File /Users/andrew/Documents/dev/andrew-spark/python/pyspark/java_gateway.py, line 49, in launch_gateway gateway_port = int(proc.stdout.readline()) ValueError: invalid literal for int() with base 10: 'spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4-deps.jar\n' {code} It's trying to read the Java gateway port as an int from the sub-process' STDOUT. However, what it read was an error message, which is clearly not an int. We should differentiate between these cases and just propagate the original message if it's not an int. Right now, this exception is not very helpful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1550) Successive creation of spark context fails in pyspark, if the previous initialization of spark context had failed.
[ https://issues.apache.org/jira/browse/SPARK-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049918#comment-14049918 ] Matthew Farrellee commented on SPARK-1550: -- this issue as reported is no longer present in spark 1.0, where defaults are provided for app name and master. {code} $ SPARK_HOME=dist PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.8.1-src.zip python Python 2.7.5 (default, Feb 19 2014, 13:47:28) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2 Type help, copyright, credits or license for more information. from pyspark import SparkContext sc=SparkContext('local') [successful creation of context] {code} i believe this should be closed as resolved. /cc: [~pwendell] Successive creation of spark context fails in pyspark, if the previous initialization of spark context had failed. -- Key: SPARK-1550 URL: https://issues.apache.org/jira/browse/SPARK-1550 Project: Spark Issue Type: Improvement Components: PySpark Reporter: Prabin Banka Labels: pyspark, sparkcontext For example;- In PySpark, if we try to initialize spark context with insufficient arguments, sc=SparkContext('local') it fails with an exception Exception: An application name must be set in your configuration This is all fine. However, any successive creation of spark context with correct arguments, also fails, s1=SparkContext('local', 'test1') AttributeError: 'SparkContext' object has no attribute 'master' -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1257) Endless running task when using pyspark with input file containing a long line
[ https://issues.apache.org/jira/browse/SPARK-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049933#comment-14049933 ] Matthew Farrellee commented on SPARK-1257: -- recommend close as resolved w/ option for filer to reopen if the issue reproduces in 1.0 /cc: [~pwendell] [~joshrosen] Endless running task when using pyspark with input file containing a long line -- Key: SPARK-1257 URL: https://issues.apache.org/jira/browse/SPARK-1257 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 0.9.0 Reporter: Hanchen Su When launching any pyspark applications with an input file containing a very long line(about 7 characters), the job will be hanging and never stops. The application UI shows that there is a task running endlessly. There will be no problem using the scala version with the same input. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1030) unneeded file required when running pyspark program using yarn-client
[ https://issues.apache.org/jira/browse/SPARK-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049929#comment-14049929 ] Matthew Farrellee commented on SPARK-1030: -- using pyspark to submit is deprecated in spark 1.0 in favor of spark-submit. i think this should be closed as resolved/workfix. /cc: [~pwendell] [~joshrosen] unneeded file required when running pyspark program using yarn-client - Key: SPARK-1030 URL: https://issues.apache.org/jira/browse/SPARK-1030 Project: Spark Issue Type: Bug Components: Deploy, PySpark, YARN Affects Versions: 0.8.1 Reporter: Diana Carroll Assignee: Josh Rosen I can successfully run a pyspark program using the yarn-client master using the following command: {code} SPARK_JAR=$SPARK_HOME/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.2.0.jar \ SPARK_YARN_APP_JAR=~/testdata.txt pyspark \ test1.py {code} However, the SPARK_YARN_APP_JAR doesn't make any sense; it's a Python program, and therefore there's no JAR. If I don't set the value, or if I set the value to a non-existent files, Spark gives me an error message. {code} py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : org.apache.spark.SparkException: env SPARK_YARN_APP_JAR is not set at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:46) {code} or {code} py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.io.FileNotFoundException: File file:dummy.txt does not exist at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520) {code} My program is very simple: {code} from pyspark import SparkContext def main(): sc = SparkContext(yarn-client, Simple App) logData = sc.textFile(hdfs://localhost/user/training/weblogs/2013-09-15.log) numjpgs = logData.filter(lambda s: '.jpg' in s).count() print Number of JPG requests: + str(numjpgs) {code} Although it reads the SPARK_YARN_APP_JAR file, it doesn't use the file at all; I can point it at anything, as long as it's a valid, accessible file, and it works the same. Although there's an obvious workaround for this bug, it's high priority from my perspective because I'm working on a course to teach people how to do this, and it's really hard to explain why this variable is needed! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1284) pyspark hangs after IOError on Executor
[ https://issues.apache.org/jira/browse/SPARK-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049937#comment-14049937 ] Matthew Farrellee commented on SPARK-1284: -- [~jblomo] - will you add a reproducer script to this issue? i did a simple test based on what you suggested w/ the tip of master and could not reproduce - {code} $ ./dist/bin/pyspark Python 2.7.5 (default, Feb 19 2014, 13:47:28) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2 Type help, copyright, credits or license for more information. ... Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 1.0.0-SNAPSHOT /_/ Using Python version 2.7.5 (default, Feb 19 2014 13:47:28) SparkContext available as sc. data = sc.textFile('/etc/passwd') 14/07/02 07:03:59 INFO MemoryStore: ensureFreeSpace(32816) called with curMem=0, maxMem=308910489 14/07/02 07:03:59 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 32.0 KB, free 294.6 MB) data.cache() /etc/passwd MappedRDD[1] at textFile at NativeMethodAccessorImpl.java:-2 data.take(10) ...[expected output]... data.flatMap(lambda line: line.split(':')).map(lambda word: (word, 1)).reduceByKey(lambda x, y: x + y).collect() ...[expected output, no hang]... {code} pyspark hangs after IOError on Executor --- Key: SPARK-1284 URL: https://issues.apache.org/jira/browse/SPARK-1284 Project: Spark Issue Type: Bug Components: PySpark Reporter: Jim Blomo When running a reduceByKey over a cached RDD, Python fails with an exception, but the failure is not detected by the task runner. Spark and the pyspark shell hang waiting for the task to finish. The error is: {code} PySpark worker failed with exception: Traceback (most recent call last): File /home/hadoop/spark/python/pyspark/worker.py, line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File /home/hadoop/spark/python/pyspark/serializers.py, line 182, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File /home/hadoop/spark/python/pyspark/serializers.py, line 118, in dump_stream self._write_with_length(obj, stream) File /home/hadoop/spark/python/pyspark/serializers.py, line 130, in _write_with_length stream.write(serialized) IOError: [Errno 104] Connection reset by peer 14/03/19 22:48:15 INFO scheduler.TaskSetManager: Serialized task 4.0:0 as 4257 bytes in 47 ms Traceback (most recent call last): File /home/hadoop/spark/python/pyspark/daemon.py, line 117, in launch_worker worker(listen_sock) File /home/hadoop/spark/python/pyspark/daemon.py, line 107, in worker outfile.flush() IOError: [Errno 32] Broken pipe {code} I can reproduce the error by running take(10) on the cached RDD before running reduceByKey (which looks at the whole input file). Affects Version 1.0.0-SNAPSHOT (4d88030486) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1473) Feature selection for high dimensional datasets
[ https://issues.apache.org/jira/browse/SPARK-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049939#comment-14049939 ] Alexander Ulanov commented on SPARK-1473: - Does anybody work on this issue? Feature selection for high dimensional datasets --- Key: SPARK-1473 URL: https://issues.apache.org/jira/browse/SPARK-1473 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Ignacio Zendejas Priority: Minor Labels: features Fix For: 1.1.0 For classification tasks involving large feature spaces in the order of tens of thousands or higher (e.g., text classification with n-grams, where n 1), it is often useful to rank and filter features that are irrelevant thereby reducing the feature space by at least one or two orders of magnitude without impacting performance on key evaluation metrics (accuracy/precision/recall). A feature evaluation interface which is flexible needs to be designed and at least two methods should be implemented with Information Gain being a priority as it has been shown to be amongst the most reliable. Special consideration should be taken in the design to account for wrapper methods (see research papers below) which are more practical for lower dimensional data. Relevant research: * Brown, G., Pocock, A., Zhao, M. J., Luján, M. (2012). Conditional likelihood maximisation: a unifying framework for information theoretic feature selection.*The Journal of Machine Learning Research*, *13*, 27-66. * Forman, George. An extensive empirical study of feature selection metrics for text classification. The Journal of machine learning research 3 (2003): 1289-1305. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2341) loadLibSVMFile doesn't handle regression datasets
[ https://issues.apache.org/jira/browse/SPARK-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049942#comment-14049942 ] Sean Owen commented on SPARK-2341: -- I've been a bit uncomfortable with how the MLlib API conflates categorical values and numbers, since they aren't numbers in general. Treating them as numbers is a convenience in some cases, and common in papers, but feels like suboptimal software design -- should a user have to convert categoricals to some numeric representation? To me it invites confusion, and this is one symptom. So I am not sure multiclass should mean parse target as double to begin with? OK, it's not the issue here. But we're on the subject of an experimental API subject to change with an example of something related that could be improved along the way, and it's my #1 wish for MLlib at the moment. I'd really like to work on a change to try to accommodate classes as, say, strings at least, and not presume doubles. But I am trying to figure out if anyone agrees with that. loadLibSVMFile doesn't handle regression datasets - Key: SPARK-2341 URL: https://issues.apache.org/jira/browse/SPARK-2341 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Reporter: Eustache Priority: Minor Labels: easyfix Many datasets exist in LibSVM format for regression tasks [1] but currently the loadLibSVMFile primitive doesn't handle regression datasets. More precisely, the LabelParser is either a MulticlassLabelParser or a BinaryLabelParser. What happens then is that the file is loaded but in multiclass mode : each target value is interpreted as a class name ! The fix would be to write a RegressionLabelParser which converts target values to Double and plug it into the loadLibSVMFile routine. [1] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1989) Exit executors faster if they get into a cycle of heavy GC
[ https://issues.apache.org/jira/browse/SPARK-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050005#comment-14050005 ] Guoqiang Li commented on SPARK-1989: In this case should also triggers the driver garbage collection. The related work: https://github.com/witgo/spark/compare/taskEvent Exit executors faster if they get into a cycle of heavy GC -- Key: SPARK-1989 URL: https://issues.apache.org/jira/browse/SPARK-1989 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Matei Zaharia Fix For: 1.1.0 I've seen situations where an application is allocating too much memory across its tasks + cache to proceed, but Java gets into a cycle where it repeatedly runs full GCs, frees up a bit of the heap, and continues instead of giving up. This then leads to timeouts and confusing error messages. It would be better to crash with OOM sooner. The JVM has options to support this: http://java.dzone.com/articles/tracking-excessive-garbage. The right solution would probably be: - Add some config options used by spark-submit to set XX:GCTimeLimit and XX:GCHeapFreeLimit, with more conservative values than the defaults (e.g. 90% time limit, 5% free limit) - Make sure we pass these into the Java options for executors in each deployment mode -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2343) QueueInputDStream with oneAtATime=false does not dequeue items
Manuel Laflamme created SPARK-2343: -- Summary: QueueInputDStream with oneAtATime=false does not dequeue items Key: SPARK-2343 URL: https://issues.apache.org/jira/browse/SPARK-2343 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.0.0, 0.9.1, 0.9.0 Reporter: Manuel Laflamme Priority: Minor QueueInputDStream does not dequeue items when used with the oneAtATime flag disabled. The same items are reprocessed for every batch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1850) Bad exception if multiple jars exist when running PySpark
[ https://issues.apache.org/jira/browse/SPARK-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050318#comment-14050318 ] Andrew Or commented on SPARK-1850: -- Ye, I will change it. Bad exception if multiple jars exist when running PySpark - Key: SPARK-1850 URL: https://issues.apache.org/jira/browse/SPARK-1850 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.0.0 Reporter: Andrew Or Fix For: 1.0.1 {code} Found multiple Spark assembly jars in /Users/andrew/Documents/dev/andrew-spark/assembly/target/scala-2.10: Traceback (most recent call last): File /Users/andrew/Documents/dev/andrew-spark/python/pyspark/shell.py, line 43, in module sc = SparkContext(os.environ.get(MASTER, local[*]), PySparkShell, pyFiles=add_files) File /Users/andrew/Documents/dev/andrew-spark/python/pyspark/context.py, line 94, in __init__ SparkContext._ensure_initialized(self, gateway=gateway) File /Users/andrew/Documents/dev/andrew-spark/python/pyspark/context.py, line 180, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File /Users/andrew/Documents/dev/andrew-spark/python/pyspark/java_gateway.py, line 49, in launch_gateway gateway_port = int(proc.stdout.readline()) ValueError: invalid literal for int() with base 10: 'spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4-deps.jar\n' {code} It's trying to read the Java gateway port as an int from the sub-process' STDOUT. However, what it read was an error message, which is clearly not an int. We should differentiate between these cases and just propagate the original message if it's not an int. Right now, this exception is not very helpful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-1850) Bad exception if multiple jars exist when running PySpark
[ https://issues.apache.org/jira/browse/SPARK-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-1850. Resolution: Fixed Bad exception if multiple jars exist when running PySpark - Key: SPARK-1850 URL: https://issues.apache.org/jira/browse/SPARK-1850 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.0.0 Reporter: Andrew Or Fix For: 1.0.1 {code} Found multiple Spark assembly jars in /Users/andrew/Documents/dev/andrew-spark/assembly/target/scala-2.10: Traceback (most recent call last): File /Users/andrew/Documents/dev/andrew-spark/python/pyspark/shell.py, line 43, in module sc = SparkContext(os.environ.get(MASTER, local[*]), PySparkShell, pyFiles=add_files) File /Users/andrew/Documents/dev/andrew-spark/python/pyspark/context.py, line 94, in __init__ SparkContext._ensure_initialized(self, gateway=gateway) File /Users/andrew/Documents/dev/andrew-spark/python/pyspark/context.py, line 180, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File /Users/andrew/Documents/dev/andrew-spark/python/pyspark/java_gateway.py, line 49, in launch_gateway gateway_port = int(proc.stdout.readline()) ValueError: invalid literal for int() with base 10: 'spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4-deps.jar\n' {code} It's trying to read the Java gateway port as an int from the sub-process' STDOUT. However, what it read was an error message, which is clearly not an int. We should differentiate between these cases and just propagate the original message if it's not an int. Right now, this exception is not very helpful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2328) Add execution of `SHOW TABLES` before `TestHive.reset()`.
[ https://issues.apache.org/jira/browse/SPARK-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2328. - Resolution: Fixed Fix Version/s: 1.1.0 1.0.1 Assignee: Takuya Ueshin Add execution of `SHOW TABLES` before `TestHive.reset()`. - Key: SPARK-2328 URL: https://issues.apache.org/jira/browse/SPARK-2328 Project: Spark Issue Type: Bug Components: SQL Reporter: Takuya Ueshin Assignee: Takuya Ueshin Fix For: 1.0.1, 1.1.0 {{PruningSuite}} is executed first of Hive tests unfortunately, {{TestHive.reset()}} breaks the test environment. To prevent this, we must run a query before calling reset the first time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2186) Spark SQL DSL support for simple aggregations such as SUM and AVG
[ https://issues.apache.org/jira/browse/SPARK-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2186. - Resolution: Fixed Fix Version/s: 1.1.0 1.0.1 Spark SQL DSL support for simple aggregations such as SUM and AVG - Key: SPARK-2186 URL: https://issues.apache.org/jira/browse/SPARK-2186 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.0.0 Reporter: Zongheng Yang Priority: Minor Fix For: 1.0.1, 1.1.0 Inspired by this thread (http://apache-spark-user-list.1001560.n3.nabble.com/Patterns-for-making-multiple-aggregations-in-one-pass-td7874.html): Spark SQL doesn't seem to have DSL support for simple aggregations such as AVG and SUM. It'd be nice if the user could avoid writing a SQL query and instead write something like: {code} data.select('country, 'age.avg, 'hits.sum).groupBy('country).collect() {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2287) Make ScalaReflection be able to handle Generic case classes.
[ https://issues.apache.org/jira/browse/SPARK-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2287. - Resolution: Fixed Fix Version/s: 1.1.0 1.0.1 Assignee: Takuya Ueshin Make ScalaReflection be able to handle Generic case classes. Key: SPARK-2287 URL: https://issues.apache.org/jira/browse/SPARK-2287 Project: Spark Issue Type: Improvement Components: SQL Reporter: Takuya Ueshin Assignee: Takuya Ueshin Fix For: 1.0.1, 1.1.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2342) Evaluation helper's output type doesn't conform to input type
[ https://issues.apache.org/jira/browse/SPARK-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050347#comment-14050347 ] Michael Armbrust commented on SPARK-2342: - This does look like a typo (though maybe one that doesn't matter due to erasure?). That said, if you make a PR I'll certainly merge it. Thanks! Evaluation helper's output type doesn't conform to input type - Key: SPARK-2342 URL: https://issues.apache.org/jira/browse/SPARK-2342 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Yijie Shen Priority: Minor Labels: easyfix In sql/catalyst/org/apache/spark/sql/catalyst/expressions.scala {code}protected final def n2 ( i: Row, e1: Expression, e2: Expression, f: ((Numeric[Any], Any, Any) = Any)): Any {code} is intended to do computations for Numeric add/Minus/Multipy. Just as the comment suggest : {quote}Those expressions are supposed to be in the same data type, and also the return type.{quote} But in code, function f was casted to function signature: {code}(Numeric[n.JvmType], n.JvmType, n.JvmType) = Int{code} I thought it as a typo and the correct should be: {code}(Numeric[n.JvmType], n.JvmType, n.JvmType) = n.JvmType{code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2277) Make TaskScheduler track whether there's host on a rack
[ https://issues.apache.org/jira/browse/SPARK-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050381#comment-14050381 ] Chen He commented on SPARK-2277: This is interesting. I will take a look. Make TaskScheduler track whether there's host on a rack --- Key: SPARK-2277 URL: https://issues.apache.org/jira/browse/SPARK-2277 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Rui Li When TaskSetManager adds a pending task, it checks whether the tasks's preferred location is available. Regarding RACK_LOCAL task, we consider the preferred rack available if such a rack is defined for the preferred host. This is incorrect as there may be no alive hosts on that rack at all. Therefore, TaskScheduler should track the hosts on each rack, and provides an API for TaskSetManager to check if there's host alive on a specific rack. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1054) Get Cassandra support in Spark Core/Spark Cassandra Module
[ https://issues.apache.org/jira/browse/SPARK-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Rai updated SPARK-1054: - Summary: Get Cassandra support in Spark Core/Spark Cassandra Module (was: Contribute Calliope Core to Spark as spark-cassandra) Get Cassandra support in Spark Core/Spark Cassandra Module -- Key: SPARK-1054 URL: https://issues.apache.org/jira/browse/SPARK-1054 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Rohit Rai Labels: calliope, cassandra Calliope is a library providing an interface to consume data from Cassandra to spark and store RDDs from Spark to Cassandra. Building as wrapper over Cassandra's Hadoop I/O it provides a simplified and very generic API to consume and produces data from and to Cassandra. It allows you to consume data from Legacy as well as CQL3 Cassandra Storage. It can also harness C* to speed up your process by fetching only the relevant data from C* harnessing CQL3 and C*'s secondary indexes. Though it currently uses only the Hadoop I/O formats for Cassandra in near future we see the same API harnessing other means of consuming Cassandra data like using the StorageProxy or even reading from SSTables directly. Over the basic data fetch functionality, the Calliope API harnesses Scala and it's implicit parameters and conversions for you to work on a higher abstraction dealing with tuples/objects instead of Cassandra's Row/Columns in your MapRed jobs. Over past few months we have seen the combination of Spark+Cassandra gaining a lot of traction. And we feel Calliope provides the path of least friction for developers to start working with this combination. We have been using this ins production for over a year now and the Calliope early access repository has 30+ users. I am putting this issue to start a discussion around whether we would want Calliope to be a part of Spark and if yes, what will be involved in doing so. You can read more about Calliope here - http://tuplejump.github.io/calliope -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1054) Get Cassandra support in Spark Core/Spark Cassandra Module
[ https://issues.apache.org/jira/browse/SPARK-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050544#comment-14050544 ] Rohit Rai commented on SPARK-1054: -- With the https://github.com/datastax/cassandra-driver-spark from Datastax, we should work on getting a united standard API in Spark, getting good things from both worlds. Get Cassandra support in Spark Core/Spark Cassandra Module -- Key: SPARK-1054 URL: https://issues.apache.org/jira/browse/SPARK-1054 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Rohit Rai Labels: calliope, cassandra Calliope is a library providing an interface to consume data from Cassandra to spark and store RDDs from Spark to Cassandra. Building as wrapper over Cassandra's Hadoop I/O it provides a simplified and very generic API to consume and produces data from and to Cassandra. It allows you to consume data from Legacy as well as CQL3 Cassandra Storage. It can also harness C* to speed up your process by fetching only the relevant data from C* harnessing CQL3 and C*'s secondary indexes. Though it currently uses only the Hadoop I/O formats for Cassandra in near future we see the same API harnessing other means of consuming Cassandra data like using the StorageProxy or even reading from SSTables directly. Over the basic data fetch functionality, the Calliope API harnesses Scala and it's implicit parameters and conversions for you to work on a higher abstraction dealing with tuples/objects instead of Cassandra's Row/Columns in your MapRed jobs. Over past few months we have seen the combination of Spark+Cassandra gaining a lot of traction. And we feel Calliope provides the path of least friction for developers to start working with this combination. We have been using this ins production for over a year now and the Calliope early access repository has 30+ users. I am putting this issue to start a discussion around whether we would want Calliope to be a part of Spark and if yes, what will be involved in doing so. You can read more about Calliope here - http://tuplejump.github.io/calliope -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2345) ForEachDStream should have an option of running the foreachfunc on Spark
Hari Shreedharan created SPARK-2345: --- Summary: ForEachDStream should have an option of running the foreachfunc on Spark Key: SPARK-2345 URL: https://issues.apache.org/jira/browse/SPARK-2345 Project: Spark Issue Type: Bug Reporter: Hari Shreedharan Today the Job generated simply calls the foreachfunc, but does not run it on spark itself using the sparkContext.runJob method. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2345) ForEachDStream should have an option of running the foreachfunc on Spark
[ https://issues.apache.org/jira/browse/SPARK-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050659#comment-14050659 ] Hari Shreedharan commented on SPARK-2345: - Currently, the job (like saveAsTextFile or saveAsHadoopFile) on the DStream will cause the rdd.save calls to be executed on sparkContext.runJob, which in turn will call the foreachfunc which is passed to the ForEachDStream. So a case where this DStream is saved off works fine. But if you simply do a register and have the foreachfunc do some processing and custom writes may cause the application to be run locally. ForEachDStream should have an option of running the foreachfunc on Spark Key: SPARK-2345 URL: https://issues.apache.org/jira/browse/SPARK-2345 Project: Spark Issue Type: Bug Reporter: Hari Shreedharan Today the Job generated simply calls the foreachfunc, but does not run it on spark itself using the sparkContext.runJob method. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2346) Error parsing table names that starts from numbers
Alexander Albul created SPARK-2346: -- Summary: Error parsing table names that starts from numbers Key: SPARK-2346 URL: https://issues.apache.org/jira/browse/SPARK-2346 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Alexander Albul Looks like org.apache.spark.sql.catalyst.SqlParser cannot parse table names when they start from numbers. Steps to reproduce: {code:title=Test.scala|borderStyle=solid} case class Data(value: String) object Test { def main(args: Array[String]) { val sc = new SparkContext(local, sql) val sqlSc = new SQLContext(sc) import sqlSc._ sc.parallelize(List(Data(one), Data(two))).registerAsTable(123_table) sql(SELECT * FROM '123_table').collect().foreach(println) } } {code} And here is an exception: {quote} Exception in thread main java.lang.RuntimeException: [1.15] failure: ``('' expected but 123_table found SELECT * FROM '123_table' ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:47) at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:70) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:150) at io.ubix.spark.Test$.main(Test.scala:24) at io.ubix.spark.Test.main(Test.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) {quote} When i am changing from 123_table to table_123 problem disappears. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2345) ForEachDStream should have an option of running the foreachfunc on Spark
[ https://issues.apache.org/jira/browse/SPARK-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050670#comment-14050670 ] Hari Shreedharan commented on SPARK-2345: - Looks like we'd have to do this in a new DStream, since the ForEachDStream takes a (RDD[T], Time)= Unit, but to call runJob we'd have to pass in (Iterator[T], Time)=Unit. I am not sure how much value this adds, but it does seem like if we are not using one of the built-in save/collect methods, you'd have to specifically run this function in context.runJob(...) Do you think this makes sense, [~tdas], [~pwendell]? ForEachDStream should have an option of running the foreachfunc on Spark Key: SPARK-2345 URL: https://issues.apache.org/jira/browse/SPARK-2345 Project: Spark Issue Type: Bug Reporter: Hari Shreedharan Today the Job generated simply calls the foreachfunc, but does not run it on spark itself using the sparkContext.runJob method. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2346) Error parsing table names that starts from numbers
[ https://issues.apache.org/jira/browse/SPARK-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Albul updated SPARK-2346: --- Description: Looks like *org.apache.spark.sql.catalyst.SqlParser* cannot parse table names when they start from numbers. Steps to reproduce: {code:title=Test.scala|borderStyle=solid} case class Data(value: String) object Test { def main(args: Array[String]) { val sc = new SparkContext(local, sql) val sqlSc = new SQLContext(sc) import sqlSc._ sc.parallelize(List(Data(one), Data(two))).registerAsTable(123_table) sql(SELECT * FROM '123_table').collect().foreach(println) } } {code} And here is an exception: {quote} Exception in thread main java.lang.RuntimeException: [1.15] failure: ``('' expected but 123_table found SELECT * FROM '123_table' ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:47) at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:70) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:150) at io.ubix.spark.Test$.main(Test.scala:24) at io.ubix.spark.Test.main(Test.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) {quote} When i am changing from 123_table to table_123 problem disappears. was: Looks like org.apache.spark.sql.catalyst.SqlParser cannot parse table names when they start from numbers. Steps to reproduce: {code:title=Test.scala|borderStyle=solid} case class Data(value: String) object Test { def main(args: Array[String]) { val sc = new SparkContext(local, sql) val sqlSc = new SQLContext(sc) import sqlSc._ sc.parallelize(List(Data(one), Data(two))).registerAsTable(123_table) sql(SELECT * FROM '123_table').collect().foreach(println) } } {code} And here is an exception: {quote} Exception in thread main java.lang.RuntimeException: [1.15] failure: ``('' expected but 123_table found SELECT * FROM '123_table' ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:47) at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:70) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:150) at io.ubix.spark.Test$.main(Test.scala:24) at io.ubix.spark.Test.main(Test.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) {quote} When i am changing from 123_table to table_123 problem disappears. Error parsing table names that starts from numbers -- Key: SPARK-2346 URL: https://issues.apache.org/jira/browse/SPARK-2346 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Alexander Albul Labels: Parser, SQL Looks like *org.apache.spark.sql.catalyst.SqlParser* cannot parse table names when they start from numbers. Steps to reproduce: {code:title=Test.scala|borderStyle=solid} case class Data(value: String) object Test { def main(args: Array[String]) { val sc = new SparkContext(local, sql) val sqlSc = new SQLContext(sc) import sqlSc._ sc.parallelize(List(Data(one), Data(two))).registerAsTable(123_table) sql(SELECT * FROM '123_table').collect().foreach(println) } } {code} And here is an exception: {quote} Exception in thread main java.lang.RuntimeException: [1.15] failure: ``('' expected but 123_table found SELECT * FROM '123_table' ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:47) at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:70) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:150) at io.ubix.spark.Test$.main(Test.scala:24) at io.ubix.spark.Test.main(Test.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[jira] [Created] (SPARK-2347) Graph object can not be set to StorageLevel.MEMORY_ONLY_SER
Baoxu Shi created SPARK-2347: Summary: Graph object can not be set to StorageLevel.MEMORY_ONLY_SER Key: SPARK-2347 URL: https://issues.apache.org/jira/browse/SPARK-2347 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.0.0 Environment: Spark standalone with 5 workers and 1 driver Reporter: Baoxu Shi I'm creating Graph object by using Graph(vertices, edges, null, StorageLevel.MEMORY_ONLY, StorageLevel.MEMORY_ONLY) But that will throw out not serializable exception on both workers and driver. 14/07/02 16:30:26 ERROR BlockManagerWorker: Exception handling buffer message java.io.NotSerializableException: org.apache.spark.graphx.impl.VertexPartition at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.SerializationStream$class.writeAll(Serializer.scala:106) at org.apache.spark.serializer.JavaSerializationStream.writeAll(JavaSerializer.scala:30) at org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:988) at org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:997) at org.apache.spark.storage.MemoryStore.getBytes(MemoryStore.scala:102) at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:392) at org.apache.spark.storage.BlockManager.getLocalBytes(BlockManager.scala:358) at org.apache.spark.storage.BlockManagerWorker.getBlock(BlockManagerWorker.scala:90) at org.apache.spark.storage.BlockManagerWorker.processBlockMessage(BlockManagerWorker.scala:69) at org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:44) at org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:44) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at org.apache.spark.storage.BlockMessageArray.foreach(BlockMessageArray.scala:28) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at org.apache.spark.storage.BlockMessageArray.map(BlockMessageArray.scala:28) at org.apache.spark.storage.BlockManagerWorker.onBlockMessageReceive(BlockManagerWorker.scala:44) at org.apache.spark.storage.BlockManagerWorker$$anonfun$1.apply(BlockManagerWorker.scala:34) at org.apache.spark.storage.BlockManagerWorker$$anonfun$1.apply(BlockManagerWorker.scala:34) at org.apache.spark.network.ConnectionManager.org$apache$spark$network$ConnectionManager$$handleMessage(ConnectionManager.scala:662) at org.apache.spark.network.ConnectionManager$$anon$9.run(ConnectionManager.scala:504) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Even if the driver sometime does not throw this exception, it will throw java.io.FileNotFoundException: /tmp/spark-local-20140702151845-9620/2a/shuffle_2_25_3 (No such file or directory) I know that VertexPartition not supposed to be serializable, so is there any workaround on this? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2339) SQL parser in sql-core is case sensitive, but a table alias is converted to lower case when we create Subquery
[ https://issues.apache.org/jira/browse/SPARK-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050721#comment-14050721 ] Yin Huai commented on SPARK-2339: - Also, names of those registered tables are case sensitive. But, names of Hive tables are case insensitive. It will cause confusion when a user using HiveContext. I think it may be good to treat all identifiers case insensitive when a user is using HiveContext and make HiveContext.sql as a alias of HiveContext.hql (basically do not expose catalyst's SQLParser in HiveContext). SQL parser in sql-core is case sensitive, but a table alias is converted to lower case when we create Subquery -- Key: SPARK-2339 URL: https://issues.apache.org/jira/browse/SPARK-2339 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Yin Huai Fix For: 1.1.0 Reported by http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Join-throws-exception-td8599.html After we get the table from the catalog, because the table has an alias, we will temporarily insert a Subquery. Then, we convert the table alias to lower case no matter if the parser is case sensitive or not. To see the issue ... {code} val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD case class Person(name: String, age: Int) val people = sc.textFile(examples/src/main/resources/people.txt).map(_.split(,)).map(p = Person(p(0), p(1).trim.toInt)) people.registerAsTable(people) sqlContext.sql(select PEOPLE.name from people PEOPLE) {code} The plan is ... {code} == Query Plan == Project ['PEOPLE.name] ExistingRdd [name#0,age#1], MapPartitionsRDD[4] at mapPartitions at basicOperators.scala:176 {code} You can find that PEOPLE.name is not resolved. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2348) In Windows having a enviorinment variable named 'classpath' gives error
Chirag Todarka created SPARK-2348: - Summary: In Windows having a enviorinment variable named 'classpath' gives error Key: SPARK-2348 URL: https://issues.apache.org/jira/browse/SPARK-2348 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Environment: Windows 7 Enterprise Reporter: Chirag Todarka Operating System:: Windows 7 Enterprise If having enviorinment variable named 'classpath' gives then starting 'spark-shell' gives below error:: mydir\spark\binspark-shell Failed to initialize compiler: object scala.runtime in compiler mirror not found . ** Note that as of 2.8 scala does not assume use of the java classpath. ** For the old behavior pass -usejavacp to scala, or if using a Settings ** object programatically, settings.usejavacp.value = true. 14/07/02 14:22:06 WARN SparkILoop$SparkILoopInterpreter: Warning: compiler acces sed before init set up. Assuming no postInit code. Failed to initialize compiler: object scala.runtime in compiler mirror not found . ** Note that as of 2.8 scala does not assume use of the java classpath. ** For the old behavior pass -usejavacp to scala, or if using a Settings ** object programatically, settings.usejavacp.value = true. Exception in thread main java.lang.AssertionError: assertion failed: null at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.sca la:202) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(Spar kILoop.scala:929) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop. scala:884) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop. scala:884) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClass Loader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (SPARK-1305) Support persisting RDD's directly to Tachyon
[ https://issues.apache.org/jira/browse/SPARK-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Saputra updated SPARK-1305: - Comment: was deleted (was: Sorry to comment on old JIRA but does anyone have PR for this ticket?) Support persisting RDD's directly to Tachyon Key: SPARK-1305 URL: https://issues.apache.org/jira/browse/SPARK-1305 Project: Spark Issue Type: New Feature Components: Block Manager Reporter: Patrick Wendell Assignee: Haoyuan Li Priority: Blocker Fix For: 1.0.0 This is already an ongoing pull request - in a nutshell we want to support Tachyon as a storage level in Spark. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2348) In Windows having a enviorinment variable named 'classpath' gives error
[ https://issues.apache.org/jira/browse/SPARK-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050757#comment-14050757 ] Chirag Todarka commented on SPARK-2348: --- [~pwendell] [~cheffpj] Hi Patrick/Pat, I am new to the project and want to contribute in this. I hope this will be a great starting point for me so please if possible assign it to me. Regards, Chirag Todarka In Windows having a enviorinment variable named 'classpath' gives error --- Key: SPARK-2348 URL: https://issues.apache.org/jira/browse/SPARK-2348 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Environment: Windows 7 Enterprise Reporter: Chirag Todarka Operating System:: Windows 7 Enterprise If having enviorinment variable named 'classpath' gives then starting 'spark-shell' gives below error:: mydir\spark\binspark-shell Failed to initialize compiler: object scala.runtime in compiler mirror not found . ** Note that as of 2.8 scala does not assume use of the java classpath. ** For the old behavior pass -usejavacp to scala, or if using a Settings ** object programatically, settings.usejavacp.value = true. 14/07/02 14:22:06 WARN SparkILoop$SparkILoopInterpreter: Warning: compiler acces sed before init set up. Assuming no postInit code. Failed to initialize compiler: object scala.runtime in compiler mirror not found . ** Note that as of 2.8 scala does not assume use of the java classpath. ** For the old behavior pass -usejavacp to scala, or if using a Settings ** object programatically, settings.usejavacp.value = true. Exception in thread main java.lang.AssertionError: assertion failed: null at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.sca la:202) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(Spar kILoop.scala:929) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop. scala:884) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop. scala:884) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClass Loader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2346) Error parsing table names that starts with numbers
[ https://issues.apache.org/jira/browse/SPARK-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Albul updated SPARK-2346: --- Summary: Error parsing table names that starts with numbers (was: Error parsing table names that starts from numbers) Error parsing table names that starts with numbers -- Key: SPARK-2346 URL: https://issues.apache.org/jira/browse/SPARK-2346 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Alexander Albul Labels: Parser, SQL Looks like *org.apache.spark.sql.catalyst.SqlParser* cannot parse table names when they start from numbers. Steps to reproduce: {code:title=Test.scala|borderStyle=solid} case class Data(value: String) object Test { def main(args: Array[String]) { val sc = new SparkContext(local, sql) val sqlSc = new SQLContext(sc) import sqlSc._ sc.parallelize(List(Data(one), Data(two))).registerAsTable(123_table) sql(SELECT * FROM '123_table').collect().foreach(println) } } {code} And here is an exception: {quote} Exception in thread main java.lang.RuntimeException: [1.15] failure: ``('' expected but 123_table found SELECT * FROM '123_table' ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:47) at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:70) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:150) at io.ubix.spark.Test$.main(Test.scala:24) at io.ubix.spark.Test.main(Test.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) {quote} When i am changing from 123_table to table_123 problem disappears. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1614) Move Mesos protobufs out of TaskState
[ https://issues.apache.org/jira/browse/SPARK-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050804#comment-14050804 ] Martin Zapletal commented on SPARK-1614: I am considering moving the protobufs to a new object - something like object org.apache.spark.MesosTaskState. Is that acceptable solution with regards to the requirements (to avoid the conflicts)? If not, can you please suggest which place would be the best for it? Move Mesos protobufs out of TaskState - Key: SPARK-1614 URL: https://issues.apache.org/jira/browse/SPARK-1614 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 0.9.1 Reporter: Shivaram Venkataraman Priority: Minor Labels: Starter To isolate usage of Mesos protobufs it would be good to move them out of TaskState into either a new class (MesosUtils ?) or CoarseGrainedMesos{Executor, Backend}. This would allow applications to build Spark to run without including protobuf from Mesos in their shaded jars. This is one way to avoid protobuf conflicts between Mesos and Hadoop (https://issues.apache.org/jira/browse/MESOS-1203) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2349) Fix NPE in ExternalAppendOnlyMap
Andrew Or created SPARK-2349: Summary: Fix NPE in ExternalAppendOnlyMap Key: SPARK-2349 URL: https://issues.apache.org/jira/browse/SPARK-2349 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Andrew Or It throws an NPE on null keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2277) Make TaskScheduler track whether there's host on a rack
[ https://issues.apache.org/jira/browse/SPARK-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050886#comment-14050886 ] Mridul Muralidharan commented on SPARK-2277: I am not sure I follow this requirement. For preferred locations, we populate their corresponding racks (if available) as preferred rack. For available executors hosts, we lookup the rack they belong to - and then see if that rack is preferred or not. This, ofcourse, assumes a host is only on a single rack. What exactly is the behavior you are expecting from scheduler ? Make TaskScheduler track whether there's host on a rack --- Key: SPARK-2277 URL: https://issues.apache.org/jira/browse/SPARK-2277 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Rui Li When TaskSetManager adds a pending task, it checks whether the tasks's preferred location is available. Regarding RACK_LOCAL task, we consider the preferred rack available if such a rack is defined for the preferred host. This is incorrect as there may be no alive hosts on that rack at all. Therefore, TaskScheduler should track the hosts on each rack, and provides an API for TaskSetManager to check if there's host alive on a specific rack. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2350) Master throws NPE
Andrew Or created SPARK-2350: Summary: Master throws NPE Key: SPARK-2350 URL: https://issues.apache.org/jira/browse/SPARK-2350 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Andrew Or Fix For: 1.1.0 ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2350) Master throws NPE
[ https://issues.apache.org/jira/browse/SPARK-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-2350: - Description: ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. {code} for (driver - waitingDrivers) { if (worker.memoryFree = driver.desc.mem worker.coresFree = driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver } } {code} was: ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. {code} for (driver - waitingDrivers) { if (worker.memoryFree = driver.desc.mem worker.coresFree = driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver } } {code} Master throws NPE - Key: SPARK-2350 URL: https://issues.apache.org/jira/browse/SPARK-2350 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Andrew Or Fix For: 1.1.0 ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. {code} for (driver - waitingDrivers) { if (worker.memoryFree = driver.desc.mem worker.coresFree = driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2350) Master throws NPE
[ https://issues.apache.org/jira/browse/SPARK-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-2350: - Description: ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. {code} for (driver - waitingDrivers) { if (worker.memoryFree = driver.desc.mem worker.coresFree = driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver } } {code} was:... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. Master throws NPE - Key: SPARK-2350 URL: https://issues.apache.org/jira/browse/SPARK-2350 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Andrew Or Fix For: 1.1.0 ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. {code} for (driver - waitingDrivers) { if (worker.memoryFree = driver.desc.mem worker.coresFree = driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2350) Master throws NPE
[ https://issues.apache.org/jira/browse/SPARK-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-2350: - Description: ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. Here is the culprit from Master.scala (L487 as of the creation of this JIRA, commit bc7041a42dfa84312492ea8cae6fdeaeac4f6d1c). {code} for (driver - waitingDrivers) { if (worker.memoryFree = driver.desc.mem worker.coresFree = driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver } } {code} was: ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. {code} for (driver - waitingDrivers) { if (worker.memoryFree = driver.desc.mem worker.coresFree = driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver } } {code} Master throws NPE - Key: SPARK-2350 URL: https://issues.apache.org/jira/browse/SPARK-2350 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Andrew Or Fix For: 1.1.0 ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. Here is the culprit from Master.scala (L487 as of the creation of this JIRA, commit bc7041a42dfa84312492ea8cae6fdeaeac4f6d1c). {code} for (driver - waitingDrivers) { if (worker.memoryFree = driver.desc.mem worker.coresFree = driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2350) Master throws NPE
[ https://issues.apache.org/jira/browse/SPARK-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050891#comment-14050891 ] Andrew Or commented on SPARK-2350: -- In general, if Master dies because of an exception, it automatically restarts and the exception message is hidden in the logs. It took a while for [~ilikerps] and I to find the exception as we are scrolling through the logs. Master throws NPE - Key: SPARK-2350 URL: https://issues.apache.org/jira/browse/SPARK-2350 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Andrew Or Fix For: 1.1.0 ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. Here is the culprit from Master.scala (L487 as of the creation of this JIRA, commit bc7041a42dfa84312492ea8cae6fdeaeac4f6d1c). {code} for (driver - waitingDrivers) { if (worker.memoryFree = driver.desc.mem worker.coresFree = driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2350) Master throws NPE
[ https://issues.apache.org/jira/browse/SPARK-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050891#comment-14050891 ] Andrew Or edited comment on SPARK-2350 at 7/3/14 12:07 AM: --- In general, if Master dies because of an exception, it automatically restarts and the exception message is hidden in the logs. In the mean time, the symptoms are not indicative of a Master having thrown an exception and restarted. It took a while for [~ilikerps] and I to find the exception as we were scrolling through the logs. was (Author: andrewor): In general, if Master dies because of an exception, it automatically restarts and the exception message is hidden in the logs. It took a while for [~ilikerps] and I to find the exception as we are scrolling through the logs. Master throws NPE - Key: SPARK-2350 URL: https://issues.apache.org/jira/browse/SPARK-2350 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Andrew Or Fix For: 1.1.0 ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. Here is the culprit from Master.scala (L487 as of the creation of this JIRA, commit bc7041a42dfa84312492ea8cae6fdeaeac4f6d1c). {code} for (driver - waitingDrivers) { if (worker.memoryFree = driver.desc.mem worker.coresFree = driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2350) Master throws NPE
[ https://issues.apache.org/jira/browse/SPARK-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050894#comment-14050894 ] Andrew Or commented on SPARK-2350: -- This is the root cause of SPARK-2154 Master throws NPE - Key: SPARK-2350 URL: https://issues.apache.org/jira/browse/SPARK-2350 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Andrew Or Fix For: 1.1.0 ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. Here is the culprit from Master.scala (L487 as of the creation of this JIRA, commit bc7041a42dfa84312492ea8cae6fdeaeac4f6d1c). {code} for (driver - waitingDrivers) { if (worker.memoryFree = driver.desc.mem worker.coresFree = driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2277) Make TaskScheduler track whether there's host on a rack
[ https://issues.apache.org/jira/browse/SPARK-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050951#comment-14050951 ] Rui Li commented on SPARK-2277: --- Suppose task1 prefers node1 but node1 is not available at the moment. However, we know node1 is on rack1, which makes task1 prefers rack1 for RACK_LOCAL locality. The problem is, we don't know if there's alive host on rack1, so we cannot check the availability of this preference. Please let me know if I misunderstand anything :) Make TaskScheduler track whether there's host on a rack --- Key: SPARK-2277 URL: https://issues.apache.org/jira/browse/SPARK-2277 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Rui Li When TaskSetManager adds a pending task, it checks whether the tasks's preferred location is available. Regarding RACK_LOCAL task, we consider the preferred rack available if such a rack is defined for the preferred host. This is incorrect as there may be no alive hosts on that rack at all. Therefore, TaskScheduler should track the hosts on each rack, and provides an API for TaskSetManager to check if there's host alive on a specific rack. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2277) Make TaskScheduler track whether there's host on a rack
[ https://issues.apache.org/jira/browse/SPARK-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050952#comment-14050952 ] Rui Li commented on SPARK-2277: --- PR created at: https://github.com/apache/spark/pull/1212 Make TaskScheduler track whether there's host on a rack --- Key: SPARK-2277 URL: https://issues.apache.org/jira/browse/SPARK-2277 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Rui Li When TaskSetManager adds a pending task, it checks whether the tasks's preferred location is available. Regarding RACK_LOCAL task, we consider the preferred rack available if such a rack is defined for the preferred host. This is incorrect as there may be no alive hosts on that rack at all. Therefore, TaskScheduler should track the hosts on each rack, and provides an API for TaskSetManager to check if there's host alive on a specific rack. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2342) Evaluation helper's output type doesn't conform to input type
[ https://issues.apache.org/jira/browse/SPARK-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050982#comment-14050982 ] Yijie Shen edited comment on SPARK-2342 at 7/3/14 1:52 AM: --- [~marmbrus] I fix the typo in PR: https://github.com/apache/spark/pull/1283. Please check it, thanks. was (Author: yijieshen): [~marmbrus] Fix the typo in PR: https://github.com/apache/spark/pull/1283. Please check it, thanks. Evaluation helper's output type doesn't conform to input type - Key: SPARK-2342 URL: https://issues.apache.org/jira/browse/SPARK-2342 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Yijie Shen Priority: Minor Labels: easyfix In sql/catalyst/org/apache/spark/sql/catalyst/expressions.scala {code}protected final def n2 ( i: Row, e1: Expression, e2: Expression, f: ((Numeric[Any], Any, Any) = Any)): Any {code} is intended to do computations for Numeric add/Minus/Multipy. Just as the comment suggest : {quote}Those expressions are supposed to be in the same data type, and also the return type.{quote} But in code, function f was casted to function signature: {code}(Numeric[n.JvmType], n.JvmType, n.JvmType) = Int{code} I thought it as a typo and the correct should be: {code}(Numeric[n.JvmType], n.JvmType, n.JvmType) = n.JvmType{code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2342) Evaluation helper's output type doesn't conform to input type
[ https://issues.apache.org/jira/browse/SPARK-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050982#comment-14050982 ] Yijie Shen edited comment on SPARK-2342 at 7/3/14 1:51 AM: --- [~marmbrus] Fix the typo in PR: https://github.com/apache/spark/pull/1283. Please check it, thanks. was (Author: yijieshen): Fix the typo in PR: https://github.com/apache/spark/pull/1283. Please check it, thanks. Evaluation helper's output type doesn't conform to input type - Key: SPARK-2342 URL: https://issues.apache.org/jira/browse/SPARK-2342 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Yijie Shen Priority: Minor Labels: easyfix In sql/catalyst/org/apache/spark/sql/catalyst/expressions.scala {code}protected final def n2 ( i: Row, e1: Expression, e2: Expression, f: ((Numeric[Any], Any, Any) = Any)): Any {code} is intended to do computations for Numeric add/Minus/Multipy. Just as the comment suggest : {quote}Those expressions are supposed to be in the same data type, and also the return type.{quote} But in code, function f was casted to function signature: {code}(Numeric[n.JvmType], n.JvmType, n.JvmType) = Int{code} I thought it as a typo and the correct should be: {code}(Numeric[n.JvmType], n.JvmType, n.JvmType) = n.JvmType{code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2342) Evaluation helper's output type doesn't conform to input type
[ https://issues.apache.org/jira/browse/SPARK-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050982#comment-14050982 ] Yijie Shen edited comment on SPARK-2342 at 7/3/14 1:52 AM: --- [~marmbrus], I fix the typo in PR: https://github.com/apache/spark/pull/1283. Please check it, thanks. was (Author: yijieshen): [~marmbrus] I fix the typo in PR: https://github.com/apache/spark/pull/1283. Please check it, thanks. Evaluation helper's output type doesn't conform to input type - Key: SPARK-2342 URL: https://issues.apache.org/jira/browse/SPARK-2342 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Yijie Shen Priority: Minor Labels: easyfix In sql/catalyst/org/apache/spark/sql/catalyst/expressions.scala {code}protected final def n2 ( i: Row, e1: Expression, e2: Expression, f: ((Numeric[Any], Any, Any) = Any)): Any {code} is intended to do computations for Numeric add/Minus/Multipy. Just as the comment suggest : {quote}Those expressions are supposed to be in the same data type, and also the return type.{quote} But in code, function f was casted to function signature: {code}(Numeric[n.JvmType], n.JvmType, n.JvmType) = Int{code} I thought it as a typo and the correct should be: {code}(Numeric[n.JvmType], n.JvmType, n.JvmType) = n.JvmType{code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2351) Add Artificial Neural Network (ANN) to Spark
Bert Greevenbosch created SPARK-2351: Summary: Add Artificial Neural Network (ANN) to Spark Key: SPARK-2351 URL: https://issues.apache.org/jira/browse/SPARK-2351 Project: Spark Issue Type: New Feature Components: MLlib Environment: MLLIB code Reporter: Bert Greevenbosch It would be good if the Machine Learning Library contained Artificial Neural Networks (ANNs). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2352) Add Artificial Neural Network (ANN) to Spark
Bert Greevenbosch created SPARK-2352: Summary: Add Artificial Neural Network (ANN) to Spark Key: SPARK-2352 URL: https://issues.apache.org/jira/browse/SPARK-2352 Project: Spark Issue Type: New Feature Components: MLlib Environment: MLLIB code Reporter: Bert Greevenbosch It would be good if the Machine Learning Library contained Artificial Neural Networks (ANNs). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-2351) Add Artificial Neural Network (ANN) to Spark
[ https://issues.apache.org/jira/browse/SPARK-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bert Greevenbosch closed SPARK-2351. Resolution: Duplicate Duplicate with SPARK-2352. Add Artificial Neural Network (ANN) to Spark Key: SPARK-2351 URL: https://issues.apache.org/jira/browse/SPARK-2351 Project: Spark Issue Type: New Feature Components: MLlib Environment: MLLIB code Reporter: Bert Greevenbosch It would be good if the Machine Learning Library contained Artificial Neural Networks (ANNs). -- This message was sent by Atlassian JIRA (v6.2#6252)