[jira] [Created] (SPARK-1420) The maven build error for Spark Catalyst
witgo created SPARK-1420: Summary: The maven build error for Spark Catalyst Key: SPARK-1420 URL: https://issues.apache.org/jira/browse/SPARK-1420 Project: Spark Issue Type: Bug Components: Build Reporter: witgo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1420) The maven build error for Spark Catalyst
[ https://issues.apache.org/jira/browse/SPARK-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961126#comment-13961126 ] witgo commented on SPARK-1420: -- {code} mvn -Pyarn -Dhadoop.version=2.3.0 -Dyarn.version=2.3.0 -DskipTests install {code} => {code} [ERROR] /Users/witgo/work/code/java/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:31: object runtime is not a member of package reflect [ERROR] import scala.reflect.runtime.universe._ {code} > The maven build error for Spark Catalyst > > > Key: SPARK-1420 > URL: https://issues.apache.org/jira/browse/SPARK-1420 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: witgo > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1366) The sql function should be consistent between different types of SQLContext
[ https://issues.apache.org/jira/browse/SPARK-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-1366. - Resolution: Fixed > The sql function should be consistent between different types of SQLContext > --- > > Key: SPARK-1366 > URL: https://issues.apache.org/jira/browse/SPARK-1366 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Blocker > Fix For: 1.0.0 > > > Right now calling `context.sql` will cause things to be parsed with different > parsers, which is kinda confusing. Instead HiveContext should have a > specialized `hiveql` method that uses the HiveQL parser. > Also need to update the documentation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1421) Make MLlib work on Python 2.6 and NumPy < 1.7
Matei Zaharia created SPARK-1421: Summary: Make MLlib work on Python 2.6 and NumPy < 1.7 Key: SPARK-1421 URL: https://issues.apache.org/jira/browse/SPARK-1421 Project: Spark Issue Type: Bug Reporter: Matei Zaharia Currently it requires Python 2.7 and newer NumPy because it uses some new APIs, but they should not be essential for running our code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1421) Make MLlib work on Python 2.6 and NumPy < 1.7
[ https://issues.apache.org/jira/browse/SPARK-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1421: - Affects Version/s: 0.9.1 0.9.0 > Make MLlib work on Python 2.6 and NumPy < 1.7 > - > > Key: SPARK-1421 > URL: https://issues.apache.org/jira/browse/SPARK-1421 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 0.9.0, 0.9.1 >Reporter: Matei Zaharia > > Currently it requires Python 2.7 and newer NumPy because it uses some new > APIs, but they should not be essential for running our code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1421) Make MLlib work on Python 2.6 and NumPy < 1.7
[ https://issues.apache.org/jira/browse/SPARK-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1421: - Component/s: PySpark MLlib > Make MLlib work on Python 2.6 and NumPy < 1.7 > - > > Key: SPARK-1421 > URL: https://issues.apache.org/jira/browse/SPARK-1421 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 0.9.0, 0.9.1 >Reporter: Matei Zaharia > > Currently it requires Python 2.7 and newer NumPy because it uses some new > APIs, but they should not be essential for running our code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1423) Add scripts for launching Spark on Windows Azure
Matei Zaharia created SPARK-1423: Summary: Add scripts for launching Spark on Windows Azure Key: SPARK-1423 URL: https://issues.apache.org/jira/browse/SPARK-1423 Project: Spark Issue Type: Improvement Reporter: Matei Zaharia -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1422) Add scripts for launching Spark on Google Compute Engine
Matei Zaharia created SPARK-1422: Summary: Add scripts for launching Spark on Google Compute Engine Key: SPARK-1422 URL: https://issues.apache.org/jira/browse/SPARK-1422 Project: Spark Issue Type: Improvement Components: EC2 Reporter: Matei Zaharia -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-1309) sbt assemble-deps no longer works
[ https://issues.apache.org/jira/browse/SPARK-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Davidson reassigned SPARK-1309: - Assignee: Aaron Davidson > sbt assemble-deps no longer works > - > > Key: SPARK-1309 > URL: https://issues.apache.org/jira/browse/SPARK-1309 > Project: Spark > Issue Type: New Feature > Components: Build >Affects Versions: 1.0.0 >Reporter: Shivaram Venkataraman >Assignee: Aaron Davidson >Priority: Blocker > Fix For: 1.0.0 > > > After the Catalyst merge the sbt assemble-deps workflow no longer works. Here > are the steps to reproduce > sbt/sbt clean > sbt/sbt assemble-deps > ./bin/spark-shell > Error: Could not find or load main class org.apache.spark.repl.Main > The error comes from the fact that compute-classpath.sh does not include the > class files if the hive assembly jar is found. > One fix would be to not build the hive assembly jar when assemble-deps is > called. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1393) fix computePreferredLocations signature to not depend on underlying implementation
[ https://issues.apache.org/jira/browse/SPARK-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961278#comment-13961278 ] Mridul Muralidharan commented on SPARK-1393: Merged https://github.com/apache/spark/pull/302 > fix computePreferredLocations signature to not depend on underlying > implementation > -- > > Key: SPARK-1393 > URL: https://issues.apache.org/jira/browse/SPARK-1393 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Environment: All >Reporter: Mridul Muralidharan > Fix For: 1.0.0 > > > computePreferredLocations in > core/src/main/scala/org/apache/spark/scheduler/InputFormatInfo.scala : change > from using mutable HashMap/HashSet to Map/Set -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1393) fix computePreferredLocations signature to not depend on underlying implementation
[ https://issues.apache.org/jira/browse/SPARK-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-1393. Resolution: Fixed > fix computePreferredLocations signature to not depend on underlying > implementation > -- > > Key: SPARK-1393 > URL: https://issues.apache.org/jira/browse/SPARK-1393 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Environment: All >Reporter: Mridul Muralidharan > Fix For: 1.0.0 > > > computePreferredLocations in > core/src/main/scala/org/apache/spark/scheduler/InputFormatInfo.scala : change > from using mutable HashMap/HashSet to Map/Set -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1424) InsertInto should work on JavaSchemaRDD as well.
Michael Armbrust created SPARK-1424: --- Summary: InsertInto should work on JavaSchemaRDD as well. Key: SPARK-1424 URL: https://issues.apache.org/jira/browse/SPARK-1424 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.0.0 Reporter: Michael Armbrust Assignee: Michael Armbrust Priority: Blocker -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1424) InsertInto should work on JavaSchemaRDD as well.
[ https://issues.apache.org/jira/browse/SPARK-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961283#comment-13961283 ] Matei Zaharia commented on SPARK-1424: -- More generally we should have flags to support the following: * Inserting data into an existing table * Creating a new table, only if it does not exist * Overwriting an existing table > InsertInto should work on JavaSchemaRDD as well. > > > Key: SPARK-1424 > URL: https://issues.apache.org/jira/browse/SPARK-1424 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.0.0 >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1425) PySpark can crash Executors if worker.py fails while serializing data
Matei Zaharia created SPARK-1425: Summary: PySpark can crash Executors if worker.py fails while serializing data Key: SPARK-1425 URL: https://issues.apache.org/jira/browse/SPARK-1425 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: Matei Zaharia The PythonRDD code that talks to the worker will keep calling stream.readInt() and allocating an array of that size. Unfortunately, if the worker gives it corrupted data, it will attempt to allocate a huge array and get an OutOfMemoryError. It would be better to use a different stream to give feedback, *or* only write an object out to the stream once it's been properly pickled to bytes or to a string. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1421) Make MLlib work on Python 2.6
[ https://issues.apache.org/jira/browse/SPARK-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1421: - Description: Currently it requires Python 2.7 because it uses some new APIs, but they should not be essential for running our code. (was: Currently it requires Python 2.7 and newer NumPy because it uses some new APIs, but they should not be essential for running our code.) > Make MLlib work on Python 2.6 > - > > Key: SPARK-1421 > URL: https://issues.apache.org/jira/browse/SPARK-1421 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 0.9.0, 0.9.1 >Reporter: Matei Zaharia > > Currently it requires Python 2.7 because it uses some new APIs, but they > should not be essential for running our code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1421) Make MLlib work on Python 2.6
[ https://issues.apache.org/jira/browse/SPARK-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1421: - Summary: Make MLlib work on Python 2.6 (was: Make MLlib work on Python 2.6 and NumPy < 1.7) > Make MLlib work on Python 2.6 > - > > Key: SPARK-1421 > URL: https://issues.apache.org/jira/browse/SPARK-1421 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 0.9.0, 0.9.1 >Reporter: Matei Zaharia > > Currently it requires Python 2.7 and newer NumPy because it uses some new > APIs, but they should not be essential for running our code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1426) Make MLlib work with NumPy versions older than 1.7
Matei Zaharia created SPARK-1426: Summary: Make MLlib work with NumPy versions older than 1.7 Key: SPARK-1426 URL: https://issues.apache.org/jira/browse/SPARK-1426 Project: Spark Issue Type: Improvement Components: MLlib, PySpark Reporter: Matei Zaharia Currently it requires NumPy 1.7 due to using the copyto method (http://docs.scipy.org/doc/numpy/reference/generated/numpy.copyto.html) for extracting data out of an array, but we could add a fallback for older versions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-1421) Make MLlib work on Python 2.6
[ https://issues.apache.org/jira/browse/SPARK-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-1421: Assignee: Matei Zaharia > Make MLlib work on Python 2.6 > - > > Key: SPARK-1421 > URL: https://issues.apache.org/jira/browse/SPARK-1421 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 0.9.0, 0.9.1 >Reporter: Matei Zaharia >Assignee: Matei Zaharia > > Currently it requires Python 2.7 because it uses some new APIs, but they > should not be essential for running our code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1421) Make MLlib work on Python 2.6
[ https://issues.apache.org/jira/browse/SPARK-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1421. -- Resolution: Fixed Fix Version/s: 0.9.2 1.0.0 > Make MLlib work on Python 2.6 > - > > Key: SPARK-1421 > URL: https://issues.apache.org/jira/browse/SPARK-1421 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 0.9.0, 0.9.1 >Reporter: Matei Zaharia >Assignee: Matei Zaharia > Fix For: 1.0.0, 0.9.2 > > > Currently it requires Python 2.7 because it uses some new APIs, but they > should not be essential for running our code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1427) HQL Examples Don't Work
Patrick Wendell created SPARK-1427: -- Summary: HQL Examples Don't Work Key: SPARK-1427 URL: https://issues.apache.org/jira/browse/SPARK-1427 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Patrick Wendell Assignee: Michael Armbrust Fix For: 1.0.0 {code} scala> hql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") 14/04/05 22:40:29 INFO ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING) 14/04/05 22:40:30 INFO ParseDriver: Parse Completed 14/04/05 22:40:30 INFO Driver: 14/04/05 22:40:30 INFO Driver: 14/04/05 22:40:30 INFO Driver: 14/04/05 22:40:30 INFO Driver: 14/04/05 22:40:30 INFO ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING) 14/04/05 22:40:30 INFO ParseDriver: Parse Completed 14/04/05 22:40:30 INFO Driver: 14/04/05 22:40:30 INFO Driver: 14/04/05 22:40:30 INFO SemanticAnalyzer: Starting Semantic Analysis 14/04/05 22:40:30 INFO SemanticAnalyzer: Creating table src position=27 14/04/05 22:40:30 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 14/04/05 22:40:30 INFO ObjectStore: ObjectStore, initialize called 14/04/05 22:40:30 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored 14/04/05 22:40:30 WARN BoneCPConfig: Max Connections < 1. Setting to 20 14/04/05 22:40:32 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 14/04/05 22:40:32 INFO ObjectStore: Initialized ObjectStore 14/04/05 22:40:33 WARN BoneCPConfig: Max Connections < 1. Setting to 20 14/04/05 22:40:33 INFO HiveMetaStore: 0: get_table : db=default tbl=src 14/04/05 22:40:33 INFO audit: ugi=patrick ip=unknown-ip-addr cmd=get_table : db=default tbl=src 14/04/05 22:40:33 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 14/04/05 22:40:33 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 14/04/05 22:40:34 INFO Driver: Semantic Analysis Completed 14/04/05 22:40:34 INFO Driver: 14/04/05 22:40:34 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 14/04/05 22:40:34 INFO Driver: 14/04/05 22:40:34 INFO Driver: 14/04/05 22:40:34 INFO Driver: Starting command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING) 14/04/05 22:40:34 INFO Driver: 14/04/05 22:40:34 INFO Driver: 14/04/05 22:40:34 INFO Driver: 14/04/05 22:40:34 INFO Driver: 14/04/05 22:40:34 INFO Driver: OK 14/04/05 22:40:34 INFO Driver: 14/04/05 22:40:34 INFO Driver: 14/04/05 22:40:34 INFO Driver: 14/04/05 22:40:34 INFO Driver: 14/04/05 22:40:34 INFO Driver: java.lang.AssertionError: assertion failed: No plan for NativeCommand CREATE TABLE IF NOT EXISTS src (key INT, value STRING) at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:218) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:218) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:219) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:219) at org.apache.spark.sql.SchemaRDDLike$class.toString(SchemaRDDLike.scala:44) at org.apache.spark.sql.SchemaRDD.toString(SchemaRDD.scala:93) at java.lang.String.valueOf(String.java:2854) at scala.runtime.ScalaRunTime$.stringOf(ScalaRunTime.scala:331) at scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:337) at .(:10) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) {code} {code} scala> hql("select count(*) from src") 14/04/05 22:47:13 INFO ParseDriver: Parsing command: select count(*) from src 14/04/05 22:47:13 INFO ParseDriver: Parse Completed 14/04/05 22:47:13 INFO HiveMetaStore: 0: get_table : db=default tbl=src 14/04/05 22:47:13 INFO audit: ugi=patrick ip=unknown-ip-addr cmd=get_table : db=default tbl=src 14/04/05 22:47:13 INFO MemoryStore: ensureFreeSpace(147107) called with curMem=0, maxMem=308713881 14/04/05 22:47:13 INFO MemoryStore: Block broadcast_0 stored as values to
[jira] [Created] (SPARK-1428) MLlib should convert non-float64 NumPy arrays to float64 instead of complaining
Matei Zaharia created SPARK-1428: Summary: MLlib should convert non-float64 NumPy arrays to float64 instead of complaining Key: SPARK-1428 URL: https://issues.apache.org/jira/browse/SPARK-1428 Project: Spark Issue Type: Improvement Components: MLlib, PySpark Reporter: Matei Zaharia Priority: Minor Pretty easy to fix, it would avoid spewing some scary task-failed errors. The place to fix this is _serialize_double_vector in _common.py. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1351) Documentation Improvements for Spark 1.0
[ https://issues.apache.org/jira/browse/SPARK-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1351: --- Description: Umbrella to track necessary doc improvements. We can break these out into other JIRA's over time. - Use grouping in the RDD and SparkContext scaladocs. See Schema RDD: http://people.apache.org/~pwendell/catalyst-docs/api/sql/core/index.html#org.apache.spark.sql.SchemaRDD - Use spark-submit script wherever possible in docs. - Have package-level documentation in Scaladoc. was: Umbrella to track necessary doc improvements. We can break these out into other JIRA's over time. - Use spark-submit script wherever possible in docs. - Have package-level documentation in Scaladoc. > Documentation Improvements for Spark 1.0 > > > Key: SPARK-1351 > URL: https://issues.apache.org/jira/browse/SPARK-1351 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Patrick Wendell >Priority: Critical > Fix For: 1.0.0 > > > Umbrella to track necessary doc improvements. We can break these out into > other JIRA's over time. > - Use grouping in the RDD and SparkContext scaladocs. See Schema RDD: > http://people.apache.org/~pwendell/catalyst-docs/api/sql/core/index.html#org.apache.spark.sql.SchemaRDD > - Use spark-submit script wherever possible in docs. > - Have package-level documentation in Scaladoc. -- This message was sent by Atlassian JIRA (v6.2#6252)