[jira] [Updated] (SPARK-5479) PySpark on yarn mode need to support non-local python files
[ https://issues.apache.org/jira/browse/SPARK-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-5479: - Component/s: YARN PySpark on yarn mode need to support non-local python files --- Key: SPARK-5479 URL: https://issues.apache.org/jira/browse/SPARK-5479 Project: Spark Issue Type: Bug Components: PySpark, YARN Affects Versions: 1.4.0 Reporter: Lianhui Wang In SPARK-5162 [~vgrigor] reports this: Now following code cannot work: aws emr add-steps --cluster-id j-XYWIXMD234 \ --steps Name=SparkPi,Jar=s3://eu-west-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[/home/hadoop/spark/bin/spark-submit,--deploy-mode,cluster,--master,yarn-cluster,--py-files,s3://mybucketat.amazonaws.com/tasks/main.py,main.py,param1],ActionOnFailure=CONTINUE so we need to support non-local python files on yarn client and cluster mode. before submitting application to Yarn, we need to download non-local files to local or hdfs path. or spark.yarn.dist.files need to support other non-local files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8114) Remove wildcard import on TestSQLContext._
[ https://issues.apache.org/jira/browse/SPARK-8114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8114: --- Assignee: Apache Spark Remove wildcard import on TestSQLContext._ -- Key: SPARK-8114 URL: https://issues.apache.org/jira/browse/SPARK-8114 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Apache Spark We import TestSQLContext._ in almost all test suites. This import introduces a lot of methods and should be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8114) Remove wildcard import on TestSQLContext._
[ https://issues.apache.org/jira/browse/SPARK-8114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573829#comment-14573829 ] Apache Spark commented on SPARK-8114: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/6661 Remove wildcard import on TestSQLContext._ -- Key: SPARK-8114 URL: https://issues.apache.org/jira/browse/SPARK-8114 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin We import TestSQLContext._ in almost all test suites. This import introduces a lot of methods and should be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8071) Run PySpark dataframe.rollup/cube test failed
[ https://issues.apache.org/jira/browse/SPARK-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573828#comment-14573828 ] Reynold Xin commented on SPARK-8071: [~chenghao] can you take a look at this? Cube is not supposed to appear in the physical planner. Run PySpark dataframe.rollup/cube test failed - Key: SPARK-8071 URL: https://issues.apache.org/jira/browse/SPARK-8071 Project: Spark Issue Type: Bug Components: PySpark Environment: OS: SUSE 11 SP3; JDK: 1.8.0_40; Python: 2.6.8; Hadoop: 2.7.0; Spark: master branch Reporter: Weizhong Priority: Minor I run test for Spark, and failed on PySpark, details are: {code} File /xxx/Spark/python/pyspark/sql/dataframe.py, line 837, in pyspark.sql.dataframe.DataFrame.cube Failed example: * df.cube('name', df.age).count().show() Exception raised: * Traceback (most recent call last): ** File /usr/lib64/python2.6/doctest.py, line 1253, in __run *** compileflags, 1) in test.globs ** File doctest pyspark.sql.dataframe.DataFrame.cube\[0], line 1, in module *** df.cube('name', df.age).count().show() ** File /xxx/Spark/python/pyspark/sql/dataframe.py, line 291, in show *** print(self._jdf.showString\(n)) ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in \_\_call\_\_ *** self.target_id, self.name) ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value *** format(target_id, '.', name), value) * Py4JJavaError: An error occurred while calling o212.showString. * : java.lang.AssertionError: assertion failed: No plan for Cube [name#1,age#0], [name#1,age#0,COUNT(1) AS count#27L], grouping__id#28 ** LogicalRDD [age#0,name#1], MapPartitionsRDD\[7] at applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2 *** at scala.Predef$.assert(Predef.scala:179) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) *** at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:312) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) *** at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) *** at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:913) *** at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:911) *** at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:917) *** at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:917) *** at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1255) *** at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1189) *** at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1248) *** at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:176) *** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) *** at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) *** at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) *** at java.lang.reflect.Method.invoke(Method.java:606) *** at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) *** at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) *** at py4j.Gateway.invoke(Gateway.java:259) *** at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) *** at py4j.commands.CallCommand.execute(CallCommand.java:79) *** at py4j.GatewayConnection.run(GatewayConnection.java:207) *** at java.lang.Thread.run(Thread.java:745) ** 1 of 1 in pyspark.sql.dataframe.DataFrame.cube 1 of 1 in pyspark.sql.dataframe.DataFrame.rollup ***Test Failed*** 2 failures. {code} cc [~davies] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8114) Remove wildcard import on TestSQLContext._
[ https://issues.apache.org/jira/browse/SPARK-8114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8114: --- Assignee: (was: Apache Spark) Remove wildcard import on TestSQLContext._ -- Key: SPARK-8114 URL: https://issues.apache.org/jira/browse/SPARK-8114 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin We import TestSQLContext._ in almost all test suites. This import introduces a lot of methods and should be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7119) ScriptTransform doesn't consider the output data type
[ https://issues.apache.org/jira/browse/SPARK-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhichao-li updated SPARK-7119: -- Comment: was deleted (was: This workaround query can be executed correctly and there's a simple fix for this issue by the way :)) ScriptTransform doesn't consider the output data type - Key: SPARK-7119 URL: https://issues.apache.org/jira/browse/SPARK-7119 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0, 1.3.1, 1.4.0 Reporter: Cheng Hao {code:sql} from (from src select transform(key, value) using 'cat' as (thing1 int, thing2 string)) t select thing1 + 2; {code} {noformat} 15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57) at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8096) how to convert dataframe field to label and features
[ https://issues.apache.org/jira/browse/SPARK-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bofei.xiao updated SPARK-8096: -- Description: how to convert the dataframe to RDD[LabelPoint] dataframe with fields target,age,sex,height i want to cast target as label,age,sex,height as features vector I faced this problem in the following circumstance: -- given i have a csv file data.csv target,age,sex,height 1,18,1,170 0,25,1,165 . now,i want build a decisitin model step 1:load csv data as dataframe val data= sqlContext.load(com.databricks.spark.csv,:Map(path - data.csv, header - true) step 2:build a decisiontree model but decisiontree need a RDD[LabelPoint] input thanks! was: given i have a csv file data.csv target,age,sex,height 1,18,1,170 0,25,1,165 . now,i want build a decisitin model step 1:load csv data as dataframe val data= sqlContext.load(com.databricks.spark.csv,:Map(path - data.csv, header - true) step 2:build a decisiontree model but decisiontree need a RDD[LabelPoint] input Q:how to convert the dataframe to RDD[LabelPoint] thanks! Summary: how to convert dataframe field to label and features (was: use csv data to build a classification model,how to convert dataframe field to label and features) how to convert dataframe field to label and features Key: SPARK-8096 URL: https://issues.apache.org/jira/browse/SPARK-8096 Project: Spark Issue Type: Bug Reporter: bofei.xiao how to convert the dataframe to RDD[LabelPoint] dataframe with fields target,age,sex,height i want to cast target as label,age,sex,height as features vector I faced this problem in the following circumstance: -- given i have a csv file data.csv target,age,sex,height 1,18,1,170 0,25,1,165 . now,i want build a decisitin model step 1:load csv data as dataframe val data= sqlContext.load(com.databricks.spark.csv,:Map(path - data.csv, header - true) step 2:build a decisiontree model but decisiontree need a RDD[LabelPoint] input thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8096) how to convert dataframe field to LabelPoint
[ https://issues.apache.org/jira/browse/SPARK-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bofei.xiao updated SPARK-8096: -- Summary: how to convert dataframe field to LabelPoint (was: how to convert dataframe field to label and features) how to convert dataframe field to LabelPoint Key: SPARK-8096 URL: https://issues.apache.org/jira/browse/SPARK-8096 Project: Spark Issue Type: Bug Reporter: bofei.xiao how to convert the dataframe to RDD[LabelPoint] dataframe with fields target,age,sex,height i want to cast target as label,age,sex,height as features vector I faced this problem in the following circumstance: -- given i have a csv file data.csv target,age,sex,height 1,18,1,170 0,25,1,165 . now,i want build a decisitin model step 1:load csv data as dataframe val data= sqlContext.load(com.databricks.spark.csv,:Map(path - data.csv, header - true) step 2:build a decisiontree model but decisiontree need a RDD[LabelPoint] input thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8071) Run PySpark dataframe.rollup/cube test failed
[ https://issues.apache.org/jira/browse/SPARK-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573952#comment-14573952 ] Cheng Hao commented on SPARK-8071: -- I couldn't reproduce that with scala API, and also it seems the failure code is from the PySpark unit test (for cube), [~davies] can you reproduce that exception? JDK / Python version issues? Run PySpark dataframe.rollup/cube test failed - Key: SPARK-8071 URL: https://issues.apache.org/jira/browse/SPARK-8071 Project: Spark Issue Type: Bug Components: PySpark Environment: OS: SUSE 11 SP3; JDK: 1.8.0_40; Python: 2.6.8; Hadoop: 2.7.0; Spark: master branch Reporter: Weizhong Priority: Minor I run test for Spark, and failed on PySpark, details are: {code} File /xxx/Spark/python/pyspark/sql/dataframe.py, line 837, in pyspark.sql.dataframe.DataFrame.cube Failed example: * df.cube('name', df.age).count().show() Exception raised: * Traceback (most recent call last): ** File /usr/lib64/python2.6/doctest.py, line 1253, in __run *** compileflags, 1) in test.globs ** File doctest pyspark.sql.dataframe.DataFrame.cube\[0], line 1, in module *** df.cube('name', df.age).count().show() ** File /xxx/Spark/python/pyspark/sql/dataframe.py, line 291, in show *** print(self._jdf.showString\(n)) ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in \_\_call\_\_ *** self.target_id, self.name) ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value *** format(target_id, '.', name), value) * Py4JJavaError: An error occurred while calling o212.showString. * : java.lang.AssertionError: assertion failed: No plan for Cube [name#1,age#0], [name#1,age#0,COUNT(1) AS count#27L], grouping__id#28 ** LogicalRDD [age#0,name#1], MapPartitionsRDD\[7] at applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2 *** at scala.Predef$.assert(Predef.scala:179) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) *** at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:312) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) *** at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) *** at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:913) *** at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:911) *** at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:917) *** at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:917) *** at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1255) *** at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1189) *** at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1248) *** at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:176) *** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) *** at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) *** at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) *** at java.lang.reflect.Method.invoke(Method.java:606) *** at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) *** at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) *** at py4j.Gateway.invoke(Gateway.java:259) *** at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) *** at py4j.commands.CallCommand.execute(CallCommand.java:79) *** at py4j.GatewayConnection.run(GatewayConnection.java:207) *** at java.lang.Thread.run(Thread.java:745) ** 1 of 1 in pyspark.sql.dataframe.DataFrame.cube 1 of 1 in pyspark.sql.dataframe.DataFrame.rollup ***Test Failed*** 2 failures. {code} cc [~davies] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8118) Turn off noisy log output produced by Parquet 1.7.0
[ https://issues.apache.org/jira/browse/SPARK-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-8118: --- Issue Type: Sub-task (was: Bug) Parent: SPARK-5463 Turn off noisy log output produced by Parquet 1.7.0 --- Key: SPARK-8118 URL: https://issues.apache.org/jira/browse/SPARK-8118 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.4.1, 1.5.0 Reporter: Cheng Lian Assignee: Cheng Lian Priority: Minor Parquet 1.7.0 renames package name to org.apache.parquet, need to adjust {{ParquetRelation.enableLogForwarding}} accordingly to avoid noisy log output. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7819) Isolated Hive Client Loader appears to cause Native Library libMapRClient.4.0.2-mapr.so already loaded in another classloader error
[ https://issues.apache.org/jira/browse/SPARK-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573929#comment-14573929 ] Nathan McCarthy commented on SPARK-7819: @Yin - looks like might build was just a little out of date! RC4 is running well! Thanks! Isolated Hive Client Loader appears to cause Native Library libMapRClient.4.0.2-mapr.so already loaded in another classloader error --- Key: SPARK-7819 URL: https://issues.apache.org/jira/browse/SPARK-7819 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Fi Priority: Critical Attachments: invalidClassException.log, stacktrace.txt, test.py In reference to the pull request: https://github.com/apache/spark/pull/5876 I have been running the Spark 1.3 branch for some time with no major hiccups, and recently switched to the Spark 1.4 branch. I build my spark distribution with the following build command: {noformat} make-distribution.sh --tgz --skip-java-test --with-tachyon -Phive -Phive-0.13.1 -Pmapr4 -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver {noformat} When running a python script containing a series of smoke tests I use to validate the build, I encountered an error under the following conditions: * start a spark context * start a hive context * run any hive query * stop the spark context * start a second spark context * run any hive query ** ERROR From what I can tell, the Isolated Class Loader is hitting a MapR class that is loading its native library (presumedly as part of a static initializer). Unfortunately, the JVM prohibits this the second time around. I would think that shutting down the SparkContext would clear out any vestigials of the JVM, so I'm surprised that this would even be a problem. Note: all other smoke tests we are running passes fine. I will attach the stacktrace and a python script reproducing the issue (at least for my environment and build). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7819) Isolated Hive Client Loader appears to cause Native Library libMapRClient.4.0.2-mapr.so already loaded in another classloader error
[ https://issues.apache.org/jira/browse/SPARK-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573944#comment-14573944 ] Yin Huai commented on SPARK-7819: - [~nemccarthy] Thank you for the update! Glad to hear that :) Isolated Hive Client Loader appears to cause Native Library libMapRClient.4.0.2-mapr.so already loaded in another classloader error --- Key: SPARK-7819 URL: https://issues.apache.org/jira/browse/SPARK-7819 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Fi Priority: Critical Attachments: invalidClassException.log, stacktrace.txt, test.py In reference to the pull request: https://github.com/apache/spark/pull/5876 I have been running the Spark 1.3 branch for some time with no major hiccups, and recently switched to the Spark 1.4 branch. I build my spark distribution with the following build command: {noformat} make-distribution.sh --tgz --skip-java-test --with-tachyon -Phive -Phive-0.13.1 -Pmapr4 -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver {noformat} When running a python script containing a series of smoke tests I use to validate the build, I encountered an error under the following conditions: * start a spark context * start a hive context * run any hive query * stop the spark context * start a second spark context * run any hive query ** ERROR From what I can tell, the Isolated Class Loader is hitting a MapR class that is loading its native library (presumedly as part of a static initializer). Unfortunately, the JVM prohibits this the second time around. I would think that shutting down the SparkContext would clear out any vestigials of the JVM, so I'm surprised that this would even be a problem. Note: all other smoke tests we are running passes fine. I will attach the stacktrace and a python script reproducing the issue (at least for my environment and build). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8119) Spark will set total executor when some executors fail.
[ https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573946#comment-14573946 ] Apache Spark commented on SPARK-8119: - User 'SaintBacchus' has created a pull request for this issue: https://github.com/apache/spark/pull/6662 Spark will set total executor when some executors fail. --- Key: SPARK-8119 URL: https://issues.apache.org/jira/browse/SPARK-8119 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.4.0 Reporter: SaintBacchus Fix For: 1.4.0 DynamicAllocation will set the total executor to a little number when it wants to kill some executors. But in no-DynamicAllocation scenario, Spark will also set the total executor. So it will cause such problem: sometimes an executor fails down, there is no more executor which will be pull up by spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8056) Design an easier way to construct schema for both Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-8056: --- Assignee: (was: Reynold Xin) Design an easier way to construct schema for both Scala and Python -- Key: SPARK-8056 URL: https://issues.apache.org/jira/browse/SPARK-8056 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin StructType is fairly hard to construct, especially in Python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8056) Design an easier way to construct schema for both Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573908#comment-14573908 ] Reynold Xin commented on SPARK-8056: I'm not actively working on this. Feel free to take over. If you have this early enough, we can even put it into 1.4.1. I like that idea. I think we should have the 2nd argument of the add support both a string for simple types, as well as a DataType object. Design an easier way to construct schema for both Scala and Python -- Key: SPARK-8056 URL: https://issues.apache.org/jira/browse/SPARK-8056 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin StructType is fairly hard to construct, especially in Python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8116) sc.range() doesn't match python range()
[ https://issues.apache.org/jira/browse/SPARK-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-8116: --- Target Version/s: 1.4.1 (was: 1.4.0, 1.4.1) sc.range() doesn't match python range() --- Key: SPARK-8116 URL: https://issues.apache.org/jira/browse/SPARK-8116 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.0, 1.4.1 Reporter: Ted Blackman Priority: Minor Labels: easyfix Python's built-in range() and xrange() functions can take 1, 2, or 3 arguments. Ranges with just 1 argument are probably used the most frequently, e.g.: for i in range(len(myList)): ... However, in pyspark, the SparkContext range() method throws an error when called with a single argument, due to the way its arguments get passed into python's range function. There's no good reason that I can think of not to support the same syntax as the built-in function. To fix this, we can set the default of the sc.range() method's `stop` argument to None, and then inside the method, if it is None, replace `stop` with `start` and set `start` to 0, which is what the c implementation of range() does: https://github.com/python/cpython/blob/master/Objects/rangeobject.c#L87 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8122) A few problems in ParquetRelation.enableLogForwarding()
Konstantin Shaposhnikov created SPARK-8122: -- Summary: A few problems in ParquetRelation.enableLogForwarding() Key: SPARK-8122 URL: https://issues.apache.org/jira/browse/SPARK-8122 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Konstantin Shaposhnikov _enableLogForwarding()_ should be updated after parquet 1.7.0 update , because name of the logger has been changed to `org.apache.parquet`. From parquet-mr Log: {code} // add a default handler in case there is none Logger logger = Logger.getLogger(Log.class.getPackage().getName()); {code} Another problem with _enableLogForwarding()_ is that it doesn't hold to the created loggers that can be garbage collected and all configuration changes will be gone. From https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html javadocs: _It is important to note that the Logger returned by one of the getLogger factory methods may be garbage collected at any time if a strong reference to the Logger is not kept._ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8106) Set derby.system.durability=test in order to speed up Hive compatibility tests
[ https://issues.apache.org/jira/browse/SPARK-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-8106: --- Component/s: Tests Build Set derby.system.durability=test in order to speed up Hive compatibility tests -- Key: SPARK-8106 URL: https://issues.apache.org/jira/browse/SPARK-8106 Project: Spark Issue Type: Improvement Components: Build, SQL, Tests Reporter: Josh Rosen Assignee: Josh Rosen Fix For: 1.5.0 Derby has a configuration property named {{derby.system.durability}} that disables I/O synchronization calls for many writes. This sacrifices durability but can result in large performance gains, which is appropriate for tests. We should enable this in our test system properties in order to speed up the Hive compatibility tests. I saw 2-3x speedups locally with this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8056) Design an easier way to construct schema for both Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573874#comment-14573874 ] Ilya Ganelin commented on SPARK-8056: - [~rxin] Are you actively working on this? I think this could be readily solved by providing interface to construct StructType the way we construct SparkConf, e.g. new StructType().add(f1,v1).add(f1,v2) etc Design an easier way to construct schema for both Scala and Python -- Key: SPARK-8056 URL: https://issues.apache.org/jira/browse/SPARK-8056 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin StructType is fairly hard to construct, especially in Python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8106) Set derby.system.durability=test in order to speed up Hive compatibility tests
[ https://issues.apache.org/jira/browse/SPARK-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-8106. Resolution: Fixed Fix Version/s: 1.5.0 Set derby.system.durability=test in order to speed up Hive compatibility tests -- Key: SPARK-8106 URL: https://issues.apache.org/jira/browse/SPARK-8106 Project: Spark Issue Type: Improvement Components: Build, SQL, Tests Reporter: Josh Rosen Assignee: Josh Rosen Fix For: 1.5.0 Derby has a configuration property named {{derby.system.durability}} that disables I/O synchronization calls for many writes. This sacrifices durability but can result in large performance gains, which is appropriate for tests. We should enable this in our test system properties in order to speed up the Hive compatibility tests. I saw 2-3x speedups locally with this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7119) ScriptTransform doesn't consider the output data type
[ https://issues.apache.org/jira/browse/SPARK-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573889#comment-14573889 ] zhichao-li commented on SPARK-7119: --- This workaround query can be executed correctly and there's a simple fix for this issue by the way :) ScriptTransform doesn't consider the output data type - Key: SPARK-7119 URL: https://issues.apache.org/jira/browse/SPARK-7119 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0, 1.3.1, 1.4.0 Reporter: Cheng Hao {code:sql} from (from src select transform(key, value) using 'cat' as (thing1 int, thing2 string)) t select thing1 + 2; {code} {noformat} 15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57) at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7119) ScriptTransform doesn't consider the output data type
[ https://issues.apache.org/jira/browse/SPARK-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573888#comment-14573888 ] zhichao-li commented on SPARK-7119: --- This workaround query can be executed correctly and there's a simple fix for this issue by the way :) ScriptTransform doesn't consider the output data type - Key: SPARK-7119 URL: https://issues.apache.org/jira/browse/SPARK-7119 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0, 1.3.1, 1.4.0 Reporter: Cheng Hao {code:sql} from (from src select transform(key, value) using 'cat' as (thing1 int, thing2 string)) t select thing1 + 2; {code} {noformat} 15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57) at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8119) Spark will set total executor when some executors fail.
[ https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-8119: Description: DynamicAllocation will set the total executor to a little number when it wants to kill some executors. But in no-DynamicAllocation scenario, Spark will also set the total executor. So it will cause such problem: sometimes an executor fails down, there is no more executor which will be pull up by spark. was: DynamicAllocation will set the total executor to a little number when it wants to kill some executors. But in no-DynamicAllocation scenario, Spark will also set the total executor. So it will cause thus problem: sometimes an executor fails down, there is no more executor which will be pull up by spark. Spark will set total executor when some executors fail. --- Key: SPARK-8119 URL: https://issues.apache.org/jira/browse/SPARK-8119 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.4.0 Reporter: SaintBacchus Fix For: 1.4.0 DynamicAllocation will set the total executor to a little number when it wants to kill some executors. But in no-DynamicAllocation scenario, Spark will also set the total executor. So it will cause such problem: sometimes an executor fails down, there is no more executor which will be pull up by spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-8096) how to convert dataframe field to LabelPoint
[ https://issues.apache.org/jira/browse/SPARK-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bofei.xiao reopened SPARK-8096: --- I'm sorry i haven't express my question clearly! how to convert dataframe field to LabelPoint Key: SPARK-8096 URL: https://issues.apache.org/jira/browse/SPARK-8096 Project: Spark Issue Type: Bug Reporter: bofei.xiao how to convert the dataframe to RDD[LabelPoint] dataframe with fields target,age,sex,height i want to cast target as label,age,sex,height as features vector I faced this problem in the following circumstance: -- given i have a csv file data.csv target,age,sex,height 1,18,1,170 0,25,1,165 . now,i want build a decisitin model step 1:load csv data as dataframe val data= sqlContext.load(com.databricks.spark.csv,:Map(path - data.csv, header - true) step 2:build a decisiontree model but decisiontree need a RDD[LabelPoint] input thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8120) Typos in warning message in sql/types.py
Joseph K. Bradley created SPARK-8120: Summary: Typos in warning message in sql/types.py Key: SPARK-8120 URL: https://issues.apache.org/jira/browse/SPARK-8120 Project: Spark Issue Type: Bug Components: PySpark, SQL Affects Versions: 1.4.0 Reporter: Joseph K. Bradley Priority: Trivial See [https://github.com/apache/spark/blob/3ba6fc515d6ea45c281bb81f648a38523be06383/python/pyspark/sql/types.py#L1093] Need to fix string concat + use of % -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8071) Run PySpark dataframe.rollup/cube test failed
[ https://issues.apache.org/jira/browse/SPARK-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573935#comment-14573935 ] Cheng Hao commented on SPARK-8071: -- Can you try `df.cube('name', 'age').count().show()`? Run PySpark dataframe.rollup/cube test failed - Key: SPARK-8071 URL: https://issues.apache.org/jira/browse/SPARK-8071 Project: Spark Issue Type: Bug Components: PySpark Environment: OS: SUSE 11 SP3; JDK: 1.8.0_40; Python: 2.6.8; Hadoop: 2.7.0; Spark: master branch Reporter: Weizhong Priority: Minor I run test for Spark, and failed on PySpark, details are: {code} File /xxx/Spark/python/pyspark/sql/dataframe.py, line 837, in pyspark.sql.dataframe.DataFrame.cube Failed example: * df.cube('name', df.age).count().show() Exception raised: * Traceback (most recent call last): ** File /usr/lib64/python2.6/doctest.py, line 1253, in __run *** compileflags, 1) in test.globs ** File doctest pyspark.sql.dataframe.DataFrame.cube\[0], line 1, in module *** df.cube('name', df.age).count().show() ** File /xxx/Spark/python/pyspark/sql/dataframe.py, line 291, in show *** print(self._jdf.showString\(n)) ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in \_\_call\_\_ *** self.target_id, self.name) ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value *** format(target_id, '.', name), value) * Py4JJavaError: An error occurred while calling o212.showString. * : java.lang.AssertionError: assertion failed: No plan for Cube [name#1,age#0], [name#1,age#0,COUNT(1) AS count#27L], grouping__id#28 ** LogicalRDD [age#0,name#1], MapPartitionsRDD\[7] at applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2 *** at scala.Predef$.assert(Predef.scala:179) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) *** at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:312) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) *** at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) *** at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:913) *** at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:911) *** at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:917) *** at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:917) *** at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1255) *** at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1189) *** at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1248) *** at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:176) *** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) *** at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) *** at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) *** at java.lang.reflect.Method.invoke(Method.java:606) *** at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) *** at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) *** at py4j.Gateway.invoke(Gateway.java:259) *** at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) *** at py4j.commands.CallCommand.execute(CallCommand.java:79) *** at py4j.GatewayConnection.run(GatewayConnection.java:207) *** at java.lang.Thread.run(Thread.java:745) ** 1 of 1 in pyspark.sql.dataframe.DataFrame.cube 1 of 1 in pyspark.sql.dataframe.DataFrame.rollup ***Test Failed*** 2 failures. {code} cc [~davies] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8121) spark.sql.parquet.output.committer.class is overriden by spark.sql.sources.outputCommitterClass
Cheng Lian created SPARK-8121: - Summary: spark.sql.parquet.output.committer.class is overriden by spark.sql.sources.outputCommitterClass Key: SPARK-8121 URL: https://issues.apache.org/jira/browse/SPARK-8121 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Cheng Lian Assignee: Cheng Lian When spark.sql.sources.outputCommitterClass is configured, spark.sql.parquet.output.committer.class will be overriden. For example, if spark.sql.parquet.output.committer.class is set to FileOutputCommitter, while spark.sql.sources.outputCommitterClass is set to DirectParquetOutputCommitter, neither _metadata nor _common_metadata will be written because FileOutputCommitter overrides DirectParquetOutputCommitter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8122) ParquetRelation.enableLogForwarding() may fail to configure loggers
[ https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shaposhnikov updated SPARK-8122: --- Description: _enableLogForwarding()_ doesn't hold to the created loggers that can be garbage collected and all configuration changes will be gone. From https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html javadocs: _It is important to note that the Logger returned by one of the getLogger factory methods may be garbage collected at any time if a strong reference to the Logger is not kept._ All created logger references need to be kept, e.g. in static variables. was: Another problem with _enableLogForwarding()_ is that it doesn't hold to the created loggers that can be garbage collected and all configuration changes will be gone. From https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html javadocs: _It is important to note that the Logger returned by one of the getLogger factory methods may be garbage collected at any time if a strong reference to the Logger is not kept._ ParquetRelation.enableLogForwarding() may fail to configure loggers --- Key: SPARK-8122 URL: https://issues.apache.org/jira/browse/SPARK-8122 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.5.0 Reporter: Konstantin Shaposhnikov Priority: Minor _enableLogForwarding()_ doesn't hold to the created loggers that can be garbage collected and all configuration changes will be gone. From https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html javadocs: _It is important to note that the Logger returned by one of the getLogger factory methods may be garbage collected at any time if a strong reference to the Logger is not kept._ All created logger references need to be kept, e.g. in static variables. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8118) Turn off noisy log output produced by Parquet 1.7.0
[ https://issues.apache.org/jira/browse/SPARK-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573981#comment-14573981 ] Konstantin Shaposhnikov commented on SPARK-8118: Name of the logger has been changed to _org.apache.parquet_. From parquet-mr Log: {code} // add a default handler in case there is none Logger logger = Logger.getLogger(Log.class.getPackage().getName()); {code} Turn off noisy log output produced by Parquet 1.7.0 --- Key: SPARK-8118 URL: https://issues.apache.org/jira/browse/SPARK-8118 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.4.1, 1.5.0 Reporter: Cheng Lian Assignee: Cheng Lian Priority: Minor Parquet 1.7.0 renames package name to org.apache.parquet, need to adjust {{ParquetRelation.enableLogForwarding}} accordingly to avoid noisy log output. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8122) A few problems in ParquetRelation.enableLogForwarding()
[ https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573973#comment-14573973 ] Konstantin Shaposhnikov commented on SPARK-8122: I believe that currently `ParquetRelation.enableLogForwarding` doesn't do anything as it configures the wrong logger (parquet instead of org.apache.parquet). I haven't tested it though. A few problems in ParquetRelation.enableLogForwarding() --- Key: SPARK-8122 URL: https://issues.apache.org/jira/browse/SPARK-8122 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.5.0 Reporter: Konstantin Shaposhnikov _enableLogForwarding()_ should be updated after parquet 1.7.0 update , because name of the logger has been changed to `org.apache.parquet`. From parquet-mr Log: {code} // add a default handler in case there is none Logger logger = Logger.getLogger(Log.class.getPackage().getName()); {code} Another problem with _enableLogForwarding()_ is that it doesn't hold to the created loggers that can be garbage collected and all configuration changes will be gone. From https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html javadocs: _It is important to note that the Logger returned by one of the getLogger factory methods may be garbage collected at any time if a strong reference to the Logger is not kept._ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8056) Design an easier way to construct schema for both Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573874#comment-14573874 ] Ilya Ganelin edited comment on SPARK-8056 at 6/5/15 12:35 AM: -- [~rxin] Are you actively working on this? I think this could be readily solved by providing an interface to construct StructType the way we construct SparkConf, e.g. new StructType().add(f1,v1).add(f1,v2) etc was (Author: ilganeli): [~rxin] Are you actively working on this? I think this could be readily solved by providing interface to construct StructType the way we construct SparkConf, e.g. new StructType().add(f1,v1).add(f1,v2) etc Design an easier way to construct schema for both Scala and Python -- Key: SPARK-8056 URL: https://issues.apache.org/jira/browse/SPARK-8056 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin StructType is fairly hard to construct, especially in Python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8118) Turn off noisy log output produced by Parquet 1.7.0
Cheng Lian created SPARK-8118: - Summary: Turn off noisy log output produced by Parquet 1.7.0 Key: SPARK-8118 URL: https://issues.apache.org/jira/browse/SPARK-8118 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1, 1.5.0 Reporter: Cheng Lian Assignee: Cheng Lian Priority: Minor Parquet 1.7.0 renames package name to org.apache.parquet, need to adjust {{ParquetRelation.enableLogForwarding}} accordingly to avoid noisy log output. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8119) Spark will set total executor when some executors fail.
SaintBacchus created SPARK-8119: --- Summary: Spark will set total executor when some executors fail. Key: SPARK-8119 URL: https://issues.apache.org/jira/browse/SPARK-8119 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.4.0 Reporter: SaintBacchus Fix For: 1.4.0 DynamicAllocation will set the total executor to a little number when it wants to kill some executors. But in no-DynamicAllocation scenario, Spark will also set the total executor. So it will cause thus problem: sometimes an executor fails down, there is no more executor which will be pull up by spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8119) Spark will set total executor when some executors fail.
[ https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8119: --- Assignee: (was: Apache Spark) Spark will set total executor when some executors fail. --- Key: SPARK-8119 URL: https://issues.apache.org/jira/browse/SPARK-8119 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.4.0 Reporter: SaintBacchus Fix For: 1.4.0 DynamicAllocation will set the total executor to a little number when it wants to kill some executors. But in no-DynamicAllocation scenario, Spark will also set the total executor. So it will cause such problem: sometimes an executor fails down, there is no more executor which will be pull up by spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8119) Spark will set total executor when some executors fail.
[ https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8119: --- Assignee: Apache Spark Spark will set total executor when some executors fail. --- Key: SPARK-8119 URL: https://issues.apache.org/jira/browse/SPARK-8119 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.4.0 Reporter: SaintBacchus Assignee: Apache Spark Fix For: 1.4.0 DynamicAllocation will set the total executor to a little number when it wants to kill some executors. But in no-DynamicAllocation scenario, Spark will also set the total executor. So it will cause such problem: sometimes an executor fails down, there is no more executor which will be pull up by spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8122) A few problems in ParquetRelation.enableLogForwarding()
[ https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573966#comment-14573966 ] Reynold Xin commented on SPARK-8122: Thanks for filing. What's the relationship between this one and SPARK-8118? A few problems in ParquetRelation.enableLogForwarding() --- Key: SPARK-8122 URL: https://issues.apache.org/jira/browse/SPARK-8122 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.5.0 Reporter: Konstantin Shaposhnikov _enableLogForwarding()_ should be updated after parquet 1.7.0 update , because name of the logger has been changed to `org.apache.parquet`. From parquet-mr Log: {code} // add a default handler in case there is none Logger logger = Logger.getLogger(Log.class.getPackage().getName()); {code} Another problem with _enableLogForwarding()_ is that it doesn't hold to the created loggers that can be garbage collected and all configuration changes will be gone. From https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html javadocs: _It is important to note that the Logger returned by one of the getLogger factory methods may be garbage collected at any time if a strong reference to the Logger is not kept._ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8122) A few problems in ParquetRelation.enableLogForwarding()
[ https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-8122: --- Issue Type: Sub-task (was: Bug) Parent: SPARK-5463 A few problems in ParquetRelation.enableLogForwarding() --- Key: SPARK-8122 URL: https://issues.apache.org/jira/browse/SPARK-8122 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.5.0 Reporter: Konstantin Shaposhnikov _enableLogForwarding()_ should be updated after parquet 1.7.0 update , because name of the logger has been changed to `org.apache.parquet`. From parquet-mr Log: {code} // add a default handler in case there is none Logger logger = Logger.getLogger(Log.class.getPackage().getName()); {code} Another problem with _enableLogForwarding()_ is that it doesn't hold to the created loggers that can be garbage collected and all configuration changes will be gone. From https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html javadocs: _It is important to note that the Logger returned by one of the getLogger factory methods may be garbage collected at any time if a strong reference to the Logger is not kept._ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8122) ParquetRelation.enableLogForwarding() may fail to configure loggers
[ https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shaposhnikov updated SPARK-8122: --- Priority: Minor (was: Major) ParquetRelation.enableLogForwarding() may fail to configure loggers --- Key: SPARK-8122 URL: https://issues.apache.org/jira/browse/SPARK-8122 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.5.0 Reporter: Konstantin Shaposhnikov Priority: Minor _enableLogForwarding()_ should be updated after parquet 1.7.0 update , because name of the logger has been changed to `org.apache.parquet`. From parquet-mr Log: {code} // add a default handler in case there is none Logger logger = Logger.getLogger(Log.class.getPackage().getName()); {code} Another problem with _enableLogForwarding()_ is that it doesn't hold to the created loggers that can be garbage collected and all configuration changes will be gone. From https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html javadocs: _It is important to note that the Logger returned by one of the getLogger factory methods may be garbage collected at any time if a strong reference to the Logger is not kept._ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8122) ParquetRelation.enableLogForwarding() may fail to configure loggers
[ https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shaposhnikov updated SPARK-8122: --- Summary: ParquetRelation.enableLogForwarding() may fail to configure loggers (was: A few problems in ParquetRelation.enableLogForwarding()) ParquetRelation.enableLogForwarding() may fail to configure loggers --- Key: SPARK-8122 URL: https://issues.apache.org/jira/browse/SPARK-8122 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.5.0 Reporter: Konstantin Shaposhnikov _enableLogForwarding()_ should be updated after parquet 1.7.0 update , because name of the logger has been changed to `org.apache.parquet`. From parquet-mr Log: {code} // add a default handler in case there is none Logger logger = Logger.getLogger(Log.class.getPackage().getName()); {code} Another problem with _enableLogForwarding()_ is that it doesn't hold to the created loggers that can be garbage collected and all configuration changes will be gone. From https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html javadocs: _It is important to note that the Logger returned by one of the getLogger factory methods may be garbage collected at any time if a strong reference to the Logger is not kept._ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8122) ParquetRelation.enableLogForwarding() may fail to configure loggers
[ https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shaposhnikov updated SPARK-8122: --- Description: Another problem with _enableLogForwarding()_ is that it doesn't hold to the created loggers that can be garbage collected and all configuration changes will be gone. From https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html javadocs: _It is important to note that the Logger returned by one of the getLogger factory methods may be garbage collected at any time if a strong reference to the Logger is not kept._ was: _enableLogForwarding()_ should be updated after parquet 1.7.0 update , because name of the logger has been changed to `org.apache.parquet`. From parquet-mr Log: {code} // add a default handler in case there is none Logger logger = Logger.getLogger(Log.class.getPackage().getName()); {code} Another problem with _enableLogForwarding()_ is that it doesn't hold to the created loggers that can be garbage collected and all configuration changes will be gone. From https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html javadocs: _It is important to note that the Logger returned by one of the getLogger factory methods may be garbage collected at any time if a strong reference to the Logger is not kept._ ParquetRelation.enableLogForwarding() may fail to configure loggers --- Key: SPARK-8122 URL: https://issues.apache.org/jira/browse/SPARK-8122 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.5.0 Reporter: Konstantin Shaposhnikov Priority: Minor Another problem with _enableLogForwarding()_ is that it doesn't hold to the created loggers that can be garbage collected and all configuration changes will be gone. From https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html javadocs: _It is important to note that the Logger returned by one of the getLogger factory methods may be garbage collected at any time if a strong reference to the Logger is not kept._ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8122) A few problems in ParquetRelation.enableLogForwarding()
[ https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573978#comment-14573978 ] Konstantin Shaposhnikov commented on SPARK-8122: SPARK-8118 is for the first problem described in this issue. The second problem (the loggers can be garbage collected) is another issue and should be fixed separately. I will updated the JIRA. A few problems in ParquetRelation.enableLogForwarding() --- Key: SPARK-8122 URL: https://issues.apache.org/jira/browse/SPARK-8122 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.5.0 Reporter: Konstantin Shaposhnikov _enableLogForwarding()_ should be updated after parquet 1.7.0 update , because name of the logger has been changed to `org.apache.parquet`. From parquet-mr Log: {code} // add a default handler in case there is none Logger logger = Logger.getLogger(Log.class.getPackage().getName()); {code} Another problem with _enableLogForwarding()_ is that it doesn't hold to the created loggers that can be garbage collected and all configuration changes will be gone. From https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html javadocs: _It is important to note that the Logger returned by one of the getLogger factory methods may be garbage collected at any time if a strong reference to the Logger is not kept._ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7536) Audit MLlib Python API for 1.4
[ https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573534#comment-14573534 ] Yanbo Liang edited comment on SPARK-7536 at 6/4/15 8:42 PM: [~josephkb] Sorry, I'm in business travel during 1st June to 10th June, so there will be no update during this period. was (Author: yanboliang): [~josephkb] I'm in business travel during 1st June to 10th June, so there will be no update during this period. Audit MLlib Python API for 1.4 -- Key: SPARK-7536 URL: https://issues.apache.org/jira/browse/SPARK-7536 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Joseph K. Bradley Assignee: Yanbo Liang For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala Python versions. We need to track: * Inconsistency: Do class/method/parameter names match? SPARK-7667 * Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc. SPARK-7666 * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release. SPARK-7665 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well. * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python. ** classification *** StreamingLogisticRegressionWithSGD SPARK-7633 ** clustering *** GaussianMixture SPARK-6258 *** LDA SPARK-6259 *** Power Iteration Clustering SPARK-5962 *** StreamingKMeans SPARK-4118 ** evaluation *** MultilabelMetrics SPARK-6094 ** feature *** ElementwiseProduct SPARK-7605 *** PCA SPARK-7604 ** linalg *** Distributed linear algebra SPARK-6100 ** pmml.export SPARK-7638 ** regression *** StreamingLinearRegressionWithSGD SPARK-4127 ** stat *** KernelDensity SPARK-7639 ** util *** MLUtils SPARK-6263 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8114) Remove wildcard import on TestSQLContext._
Reynold Xin created SPARK-8114: -- Summary: Remove wildcard import on TestSQLContext._ Key: SPARK-8114 URL: https://issues.apache.org/jira/browse/SPARK-8114 Project: Spark Issue Type: Sub-task Reporter: Reynold Xin We import TestSQLContext._ in almost all test suites. This import introduces a lot of methods and should be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8116) sc.range() doesn't match python range()
Ted Blackman created SPARK-8116: --- Summary: sc.range() doesn't match python range() Key: SPARK-8116 URL: https://issues.apache.org/jira/browse/SPARK-8116 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.0, 1.4.1 Reporter: Ted Blackman Priority: Minor Python's built-in range() and xrange() functions can take 1, 2, or 3 arguments. Ranges with just 1 argument are probably used the most frequently, e.g.: for i in range(len(myList)): ... However, in pyspark, the SparkContext range() method throws an error when called with a single argument, due to the way its arguments get passed into python's range function. There's no good reason that I can think of not to support the same syntax as the built-in function. To fix this, we can set the default of the sc.range() method's `stop` argument to None, and then inside the method, if it is None, replace `stop` with `start` and set `start` to 0, which is what the c implementation of range() does: https://github.com/python/cpython/blob/master/Objects/rangeobject.c#L87 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8116) sc.range() doesn't match python range()
[ https://issues.apache.org/jira/browse/SPARK-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8116: --- Assignee: Apache Spark sc.range() doesn't match python range() --- Key: SPARK-8116 URL: https://issues.apache.org/jira/browse/SPARK-8116 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.0, 1.4.1 Reporter: Ted Blackman Assignee: Apache Spark Priority: Minor Labels: easyfix Python's built-in range() and xrange() functions can take 1, 2, or 3 arguments. Ranges with just 1 argument are probably used the most frequently, e.g.: for i in range(len(myList)): ... However, in pyspark, the SparkContext range() method throws an error when called with a single argument, due to the way its arguments get passed into python's range function. There's no good reason that I can think of not to support the same syntax as the built-in function. To fix this, we can set the default of the sc.range() method's `stop` argument to None, and then inside the method, if it is None, replace `stop` with `start` and set `start` to 0, which is what the c implementation of range() does: https://github.com/python/cpython/blob/master/Objects/rangeobject.c#L87 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8116) sc.range() doesn't match python range()
[ https://issues.apache.org/jira/browse/SPARK-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573752#comment-14573752 ] Apache Spark commented on SPARK-8116: - User 'belisarius222' has created a pull request for this issue: https://github.com/apache/spark/pull/6656 sc.range() doesn't match python range() --- Key: SPARK-8116 URL: https://issues.apache.org/jira/browse/SPARK-8116 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.0, 1.4.1 Reporter: Ted Blackman Priority: Minor Labels: easyfix Python's built-in range() and xrange() functions can take 1, 2, or 3 arguments. Ranges with just 1 argument are probably used the most frequently, e.g.: for i in range(len(myList)): ... However, in pyspark, the SparkContext range() method throws an error when called with a single argument, due to the way its arguments get passed into python's range function. There's no good reason that I can think of not to support the same syntax as the built-in function. To fix this, we can set the default of the sc.range() method's `stop` argument to None, and then inside the method, if it is None, replace `stop` with `start` and set `start` to 0, which is what the c implementation of range() does: https://github.com/python/cpython/blob/master/Objects/rangeobject.c#L87 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8116) sc.range() doesn't match python range()
[ https://issues.apache.org/jira/browse/SPARK-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8116: --- Assignee: (was: Apache Spark) sc.range() doesn't match python range() --- Key: SPARK-8116 URL: https://issues.apache.org/jira/browse/SPARK-8116 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.0, 1.4.1 Reporter: Ted Blackman Priority: Minor Labels: easyfix Python's built-in range() and xrange() functions can take 1, 2, or 3 arguments. Ranges with just 1 argument are probably used the most frequently, e.g.: for i in range(len(myList)): ... However, in pyspark, the SparkContext range() method throws an error when called with a single argument, due to the way its arguments get passed into python's range function. There's no good reason that I can think of not to support the same syntax as the built-in function. To fix this, we can set the default of the sc.range() method's `stop` argument to None, and then inside the method, if it is None, replace `stop` with `start` and set `start` to 0, which is what the c implementation of range() does: https://github.com/python/cpython/blob/master/Objects/rangeobject.c#L87 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8095) Spark package dependencies not resolved when package is in local-ivy-cache
[ https://issues.apache.org/jira/browse/SPARK-8095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573784#comment-14573784 ] Apache Spark commented on SPARK-8095: - User 'andrewor14' has created a pull request for this issue: https://github.com/apache/spark/pull/6658 Spark package dependencies not resolved when package is in local-ivy-cache -- Key: SPARK-8095 URL: https://issues.apache.org/jira/browse/SPARK-8095 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.4.0 Reporter: Eron Wright Given a dependency expressed with '--packages', the transitive dependencies are supposed to be automatically included. This is true for most repository types including local-m2-cache, Spark Packages, and central. For ivy-local-cache, it is not. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8095) Spark package dependencies not resolved when package is in local-ivy-cache
[ https://issues.apache.org/jira/browse/SPARK-8095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8095: --- Assignee: (was: Apache Spark) Spark package dependencies not resolved when package is in local-ivy-cache -- Key: SPARK-8095 URL: https://issues.apache.org/jira/browse/SPARK-8095 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.4.0 Reporter: Eron Wright Given a dependency expressed with '--packages', the transitive dependencies are supposed to be automatically included. This is true for most repository types including local-m2-cache, Spark Packages, and central. For ivy-local-cache, it is not. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8071) Run PySpark dataframe.rollup/cube test failed
[ https://issues.apache.org/jira/browse/SPARK-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-8071: --- Description: I run test for Spark, and failed on PySpark, details are: {code} File /xxx/Spark/python/pyspark/sql/dataframe.py, line 837, in pyspark.sql.dataframe.DataFrame.cube Failed example: * df.cube('name', df.age).count().show() Exception raised: * Traceback (most recent call last): ** File /usr/lib64/python2.6/doctest.py, line 1253, in __run *** compileflags, 1) in test.globs ** File doctest pyspark.sql.dataframe.DataFrame.cube\[0], line 1, in module *** df.cube('name', df.age).count().show() ** File /xxx/Spark/python/pyspark/sql/dataframe.py, line 291, in show *** print(self._jdf.showString\(n)) ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in \_\_call\_\_ *** self.target_id, self.name) ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value *** format(target_id, '.', name), value) * Py4JJavaError: An error occurred while calling o212.showString. * : java.lang.AssertionError: assertion failed: No plan for Cube [name#1,age#0], [name#1,age#0,COUNT(1) AS count#27L], grouping__id#28 ** LogicalRDD [age#0,name#1], MapPartitionsRDD\[7] at applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2 *** at scala.Predef$.assert(Predef.scala:179) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) *** at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:312) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) *** at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) *** at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:913) *** at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:911) *** at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:917) *** at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:917) *** at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1255) *** at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1189) *** at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1248) *** at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:176) *** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) *** at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) *** at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) *** at java.lang.reflect.Method.invoke(Method.java:606) *** at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) *** at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) *** at py4j.Gateway.invoke(Gateway.java:259) *** at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) *** at py4j.commands.CallCommand.execute(CallCommand.java:79) *** at py4j.GatewayConnection.run(GatewayConnection.java:207) *** at java.lang.Thread.run(Thread.java:745) ** 1 of 1 in pyspark.sql.dataframe.DataFrame.cube 1 of 1 in pyspark.sql.dataframe.DataFrame.rollup ***Test Failed*** 2 failures. {code} cc [~davies] was: I run test for Spark, and failed on PySpark, details are: File /xxx/Spark/python/pyspark/sql/dataframe.py, line 837, in pyspark.sql.dataframe.DataFrame.cube Failed example: * df.cube('name', df.age).count().show() Exception raised: * Traceback (most recent call last): ** File /usr/lib64/python2.6/doctest.py, line 1253, in __run *** compileflags, 1) in test.globs ** File doctest pyspark.sql.dataframe.DataFrame.cube\[0], line 1, in module *** df.cube('name', df.age).count().show() ** File /xxx/Spark/python/pyspark/sql/dataframe.py, line 291, in show *** print(self._jdf.showString\(n)) ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in \_\_call\_\_ *** self.target_id, self.name) ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value *** format(target_id, '.', name), value) * Py4JJavaError: An error occurred while calling o212.showString. * : java.lang.AssertionError: assertion failed: No plan for Cube [name#1,age#0], [name#1,age#0,COUNT(1) AS count#27L], grouping__id#28 ** LogicalRDD [age#0,name#1], MapPartitionsRDD\[7] at applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2 *** at scala.Predef$.assert(Predef.scala:179) *** at
[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573820#comment-14573820 ] DB Tsai commented on SPARK-7008: Do you see better convergence rate when LBFGS is used? An implementation of Factorization Machine (LibFM) -- Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.3.0, 1.3.1, 1.3.2 Reporter: zhengruifeng Labels: features, patch Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png An implementation of Factorization Machines based on Scala and Spark MLlib. FM is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. FM works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7440) Remove physical Distinct operator in favor of Aggregate
[ https://issues.apache.org/jira/browse/SPARK-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin reassigned SPARK-7440: -- Assignee: Reynold Xin Remove physical Distinct operator in favor of Aggregate --- Key: SPARK-7440 URL: https://issues.apache.org/jira/browse/SPARK-7440 Project: Spark Issue Type: New Feature Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Fix For: 1.5.0 We can just rewrite distinct using groupby (i.e. aggregate operator). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7440) Remove physical Distinct operator in favor of Aggregate
[ https://issues.apache.org/jira/browse/SPARK-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-7440. Resolution: Fixed Fix Version/s: 1.5.0 Remove physical Distinct operator in favor of Aggregate --- Key: SPARK-7440 URL: https://issues.apache.org/jira/browse/SPARK-7440 Project: Spark Issue Type: New Feature Components: SQL Reporter: Reynold Xin Fix For: 1.5.0 We can just rewrite distinct using groupby (i.e. aggregate operator). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8111) SparkR shell should display Spark logo and version banner on startup
[ https://issues.apache.org/jira/browse/SPARK-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman updated SPARK-8111: - Labels: Starter (was: ) SparkR shell should display Spark logo and version banner on startup Key: SPARK-8111 URL: https://issues.apache.org/jira/browse/SPARK-8111 Project: Spark Issue Type: Improvement Components: SparkR Affects Versions: 1.4.0 Reporter: Matei Zaharia Priority: Trivial Labels: Starter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8112) Received block event count through the StreamingListener can be negative
[ https://issues.apache.org/jira/browse/SPARK-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573573#comment-14573573 ] Tathagata Das commented on SPARK-8112: -- Take a look at SPARK-8080 Received block event count through the StreamingListener can be negative Key: SPARK-8112 URL: https://issues.apache.org/jira/browse/SPARK-8112 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.4.0 Reporter: Tathagata Das Assignee: Shixiong Zhu Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8111) SparkR shell should display Spark logo and version banner on startup
[ https://issues.apache.org/jira/browse/SPARK-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573574#comment-14573574 ] Shivaram Venkataraman commented on SPARK-8111: -- The code will need to go in https://github.com/apache/spark/blob/2bcdf8c239d2ba79f64fb8878da83d4c2ec28b30/R/pkg/inst/profile/shell.R#L31 SparkR shell should display Spark logo and version banner on startup Key: SPARK-8111 URL: https://issues.apache.org/jira/browse/SPARK-8111 Project: Spark Issue Type: Improvement Components: SparkR Affects Versions: 1.4.0 Reporter: Matei Zaharia Priority: Trivial Labels: Starter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7991) Python DataFrame: support passing a list into describe
[ https://issues.apache.org/jira/browse/SPARK-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7991: --- Assignee: Apache Spark Python DataFrame: support passing a list into describe -- Key: SPARK-7991 URL: https://issues.apache.org/jira/browse/SPARK-7991 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Apache Spark Labels: starter DataFrame.describe in Python takes a vararg, i.e. it can be invoked this way: {code} df.describe('col1', 'col2', 'col3') {code} Most of our DataFrame functions accept a list in addition to varargs. describe should do the same, i.e. it should also accept a Python list: {code} df.describe(['col1', 'col2', 'col3']) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7991) Python DataFrame: support passing a list into describe
[ https://issues.apache.org/jira/browse/SPARK-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7991: --- Assignee: (was: Apache Spark) Python DataFrame: support passing a list into describe -- Key: SPARK-7991 URL: https://issues.apache.org/jira/browse/SPARK-7991 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Labels: starter DataFrame.describe in Python takes a vararg, i.e. it can be invoked this way: {code} df.describe('col1', 'col2', 'col3') {code} Most of our DataFrame functions accept a list in addition to varargs. describe should do the same, i.e. it should also accept a Python list: {code} df.describe(['col1', 'col2', 'col3']) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7991) Python DataFrame: support passing a list into describe
[ https://issues.apache.org/jira/browse/SPARK-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573678#comment-14573678 ] Apache Spark commented on SPARK-7991: - User 'ameyc' has created a pull request for this issue: https://github.com/apache/spark/pull/6655 Python DataFrame: support passing a list into describe -- Key: SPARK-7991 URL: https://issues.apache.org/jira/browse/SPARK-7991 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Labels: starter DataFrame.describe in Python takes a vararg, i.e. it can be invoked this way: {code} df.describe('col1', 'col2', 'col3') {code} Most of our DataFrame functions accept a list in addition to varargs. describe should do the same, i.e. it should also accept a Python list: {code} df.describe(['col1', 'col2', 'col3']) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8109) TestSQLContext's static initialization is run during MiMa tests, causing SparkContexts to be created
[ https://issues.apache.org/jira/browse/SPARK-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-8109: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-8113 TestSQLContext's static initialization is run during MiMa tests, causing SparkContexts to be created Key: SPARK-8109 URL: https://issues.apache.org/jira/browse/SPARK-8109 Project: Spark Issue Type: Sub-task Components: SQL, Tests Reporter: Josh Rosen Check out this stacktrace which occurred during MiMa tests in the pull request builder: {code} java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187) at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316) at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.eclipse.jetty.server.Server.doStart(Server.java:293) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:228) at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:238) at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:238) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982) at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:238) at org.apache.spark.ui.WebUI.bind(WebUI.scala:117) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:448) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:448) at scala.Option.foreach(Option.scala:236) at org.apache.spark.SparkContext.init(SparkContext.scala:448) at org.apache.spark.SparkContext.init(SparkContext.scala:135) at org.apache.spark.sql.test.LocalSQLContext.init(TestSQLContext.scala:29) at org.apache.spark.sql.test.TestSQLContext$.init(TestSQLContext.scala:55) at org.apache.spark.sql.test.TestSQLContext$.clinit(TestSQLContext.scala) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at scala.reflect.runtime.JavaMirrors$JavaMirror.javaClass(JavaMirrors.scala:500) at scala.reflect.runtime.JavaMirrors$JavaMirror.tryJavaClass(JavaMirrors.scala:505) at scala.reflect.runtime.SymbolLoaders$PackageScope.lookupEntry(SymbolLoaders.scala:109) at scala.reflect.internal.Types$Type.findMember(Types.scala:1185) at scala.reflect.internal.Types$Type.memberBasedOnName(Types.scala:722) at scala.reflect.internal.Types$Type.member(Types.scala:680) at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:43) at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61) at scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72) at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:161) at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:21) at org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:72) at org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:69) at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) at org.apache.spark.tools.GenerateMIMAIgnore$.privateWithin(GenerateMIMAIgnore.scala:69) at org.apache.spark.tools.GenerateMIMAIgnore$.main(GenerateMIMAIgnore.scala:126) at org.apache.spark.tools.GenerateMIMAIgnore.main(GenerateMIMAIgnore.scala) {code} Here, TestSQLContext's static initialization code is being run during MiMa checks and that initialization creates a SparkContext. Because
[jira] [Commented] (SPARK-8071) Run PySpark dataframe.rollup/cube test failed
[ https://issues.apache.org/jira/browse/SPARK-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573771#comment-14573771 ] Davies Liu commented on SPARK-8071: --- It failed in Scala side, cc [~rxin] Run PySpark dataframe.rollup/cube test failed - Key: SPARK-8071 URL: https://issues.apache.org/jira/browse/SPARK-8071 Project: Spark Issue Type: Bug Components: PySpark Environment: OS: SUSE 11 SP3; JDK: 1.8.0_40; Python: 2.6.8; Hadoop: 2.7.0; Spark: master branch Reporter: Weizhong Priority: Minor I run test for Spark, and failed on PySpark, details are: File /xxx/Spark/python/pyspark/sql/dataframe.py, line 837, in pyspark.sql.dataframe.DataFrame.cube Failed example: * df.cube('name', df.age).count().show() Exception raised: * Traceback (most recent call last): ** File /usr/lib64/python2.6/doctest.py, line 1253, in __run *** compileflags, 1) in test.globs ** File doctest pyspark.sql.dataframe.DataFrame.cube\[0], line 1, in module *** df.cube('name', df.age).count().show() ** File /xxx/Spark/python/pyspark/sql/dataframe.py, line 291, in show *** print(self._jdf.showString\(n)) ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in \_\_call\_\_ *** self.target_id, self.name) ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value *** format(target_id, '.', name), value) * Py4JJavaError: An error occurred while calling o212.showString. * : java.lang.AssertionError: assertion failed: No plan for Cube [name#1,age#0], [name#1,age#0,COUNT(1) AS count#27L], grouping__id#28 ** LogicalRDD [age#0,name#1], MapPartitionsRDD\[7] at applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2 *** at scala.Predef$.assert(Predef.scala:179) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) *** at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:312) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) *** at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) *** at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) *** at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:913) *** at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:911) *** at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:917) *** at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:917) *** at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1255) *** at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1189) *** at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1248) *** at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:176) *** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) *** at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) *** at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) *** at java.lang.reflect.Method.invoke(Method.java:606) *** at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) *** at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) *** at py4j.Gateway.invoke(Gateway.java:259) *** at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) *** at py4j.commands.CallCommand.execute(CallCommand.java:79) *** at py4j.GatewayConnection.run(GatewayConnection.java:207) *** at java.lang.Thread.run(Thread.java:745) ** 1 of 1 in pyspark.sql.dataframe.DataFrame.cube 1 of 1 in pyspark.sql.dataframe.DataFrame.rollup ***Test Failed*** 2 failures. cc [~davies] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8113) SQL module test cleanup
[ https://issues.apache.org/jira/browse/SPARK-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573822#comment-14573822 ] Apache Spark commented on SPARK-8113: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/6661 SQL module test cleanup --- Key: SPARK-8113 URL: https://issues.apache.org/jira/browse/SPARK-8113 Project: Spark Issue Type: Umbrella Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Priority: Minor Some cleanup tasks to track here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8113) SQL module test cleanup
[ https://issues.apache.org/jira/browse/SPARK-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8113: --- Assignee: Reynold Xin (was: Apache Spark) SQL module test cleanup --- Key: SPARK-8113 URL: https://issues.apache.org/jira/browse/SPARK-8113 Project: Spark Issue Type: Umbrella Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Priority: Minor Some cleanup tasks to track here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8113) SQL module test cleanup
[ https://issues.apache.org/jira/browse/SPARK-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8113: --- Assignee: Apache Spark (was: Reynold Xin) SQL module test cleanup --- Key: SPARK-8113 URL: https://issues.apache.org/jira/browse/SPARK-8113 Project: Spark Issue Type: Umbrella Components: SQL Reporter: Reynold Xin Assignee: Apache Spark Priority: Minor Some cleanup tasks to track here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7536) Audit MLlib Python API for 1.4
[ https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573536#comment-14573536 ] Yanbo Liang commented on SPARK-7536: [~josephkb] I'm in business travel during 1st June to 10th June, so there will be no update during this period. Audit MLlib Python API for 1.4 -- Key: SPARK-7536 URL: https://issues.apache.org/jira/browse/SPARK-7536 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Joseph K. Bradley Assignee: Yanbo Liang For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala Python versions. We need to track: * Inconsistency: Do class/method/parameter names match? SPARK-7667 * Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc. SPARK-7666 * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release. SPARK-7665 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well. * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python. ** classification *** StreamingLogisticRegressionWithSGD SPARK-7633 ** clustering *** GaussianMixture SPARK-6258 *** LDA SPARK-6259 *** Power Iteration Clustering SPARK-5962 *** StreamingKMeans SPARK-4118 ** evaluation *** MultilabelMetrics SPARK-6094 ** feature *** ElementwiseProduct SPARK-7605 *** PCA SPARK-7604 ** linalg *** Distributed linear algebra SPARK-6100 ** pmml.export SPARK-7638 ** regression *** StreamingLinearRegressionWithSGD SPARK-4127 ** stat *** KernelDensity SPARK-7639 ** util *** MLUtils SPARK-6263 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7536) Audit MLlib Python API for 1.4
[ https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573534#comment-14573534 ] Yanbo Liang commented on SPARK-7536: [~josephkb] I'm in business travel during 1st June to 10th June, so there will be no update during this period. Audit MLlib Python API for 1.4 -- Key: SPARK-7536 URL: https://issues.apache.org/jira/browse/SPARK-7536 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Joseph K. Bradley Assignee: Yanbo Liang For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala Python versions. We need to track: * Inconsistency: Do class/method/parameter names match? SPARK-7667 * Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc. SPARK-7666 * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release. SPARK-7665 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well. * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python. ** classification *** StreamingLogisticRegressionWithSGD SPARK-7633 ** clustering *** GaussianMixture SPARK-6258 *** LDA SPARK-6259 *** Power Iteration Clustering SPARK-5962 *** StreamingKMeans SPARK-4118 ** evaluation *** MultilabelMetrics SPARK-6094 ** feature *** ElementwiseProduct SPARK-7605 *** PCA SPARK-7604 ** linalg *** Distributed linear algebra SPARK-6100 ** pmml.export SPARK-7638 ** regression *** StreamingLinearRegressionWithSGD SPARK-4127 ** stat *** KernelDensity SPARK-7639 ** util *** MLUtils SPARK-6263 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7536) Audit MLlib Python API for 1.4
[ https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-7536: --- Comment: was deleted (was: [~josephkb] I'm in business travel during 1st June to 10th June, so there will be no update during this period.) Audit MLlib Python API for 1.4 -- Key: SPARK-7536 URL: https://issues.apache.org/jira/browse/SPARK-7536 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Joseph K. Bradley Assignee: Yanbo Liang For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala Python versions. We need to track: * Inconsistency: Do class/method/parameter names match? SPARK-7667 * Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc. SPARK-7666 * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release. SPARK-7665 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well. * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python. ** classification *** StreamingLogisticRegressionWithSGD SPARK-7633 ** clustering *** GaussianMixture SPARK-6258 *** LDA SPARK-6259 *** Power Iteration Clustering SPARK-5962 *** StreamingKMeans SPARK-4118 ** evaluation *** MultilabelMetrics SPARK-6094 ** feature *** ElementwiseProduct SPARK-7605 *** PCA SPARK-7604 ** linalg *** Distributed linear algebra SPARK-6100 ** pmml.export SPARK-7638 ** regression *** StreamingLinearRegressionWithSGD SPARK-4127 ** stat *** KernelDensity SPARK-7639 ** util *** MLUtils SPARK-6263 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7536) Audit MLlib Python API for 1.4
[ https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-7536: --- Comment: was deleted (was: [~josephkb] I'm in business travel during 1st June to 10th June, so there will be no update during this period.) Audit MLlib Python API for 1.4 -- Key: SPARK-7536 URL: https://issues.apache.org/jira/browse/SPARK-7536 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Joseph K. Bradley Assignee: Yanbo Liang For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala Python versions. We need to track: * Inconsistency: Do class/method/parameter names match? SPARK-7667 * Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc. SPARK-7666 * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release. SPARK-7665 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well. * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python. ** classification *** StreamingLogisticRegressionWithSGD SPARK-7633 ** clustering *** GaussianMixture SPARK-6258 *** LDA SPARK-6259 *** Power Iteration Clustering SPARK-5962 *** StreamingKMeans SPARK-4118 ** evaluation *** MultilabelMetrics SPARK-6094 ** feature *** ElementwiseProduct SPARK-7605 *** PCA SPARK-7604 ** linalg *** Distributed linear algebra SPARK-6100 ** pmml.export SPARK-7638 ** regression *** StreamingLinearRegressionWithSGD SPARK-4127 ** stat *** KernelDensity SPARK-7639 ** util *** MLUtils SPARK-6263 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7536) Audit MLlib Python API for 1.4
[ https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-7536: --- Comment: was deleted (was: [~josephkb] I'm in business travel during 1st June to 10th June, so there will be no update during this period.) Audit MLlib Python API for 1.4 -- Key: SPARK-7536 URL: https://issues.apache.org/jira/browse/SPARK-7536 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Joseph K. Bradley Assignee: Yanbo Liang For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala Python versions. We need to track: * Inconsistency: Do class/method/parameter names match? SPARK-7667 * Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc. SPARK-7666 * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release. SPARK-7665 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well. * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python. ** classification *** StreamingLogisticRegressionWithSGD SPARK-7633 ** clustering *** GaussianMixture SPARK-6258 *** LDA SPARK-6259 *** Power Iteration Clustering SPARK-5962 *** StreamingKMeans SPARK-4118 ** evaluation *** MultilabelMetrics SPARK-6094 ** feature *** ElementwiseProduct SPARK-7605 *** PCA SPARK-7604 ** linalg *** Distributed linear algebra SPARK-6100 ** pmml.export SPARK-7638 ** regression *** StreamingLinearRegressionWithSGD SPARK-4127 ** stat *** KernelDensity SPARK-7639 ** util *** MLUtils SPARK-6263 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7417) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies
[ https://issues.apache.org/jira/browse/SPARK-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-7417: - Assignee: Burak Yavuz Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies --- Key: SPARK-7417 URL: https://issues.apache.org/jira/browse/SPARK-7417 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 1.4.0 Reporter: Andrew Or Assignee: Burak Yavuz Priority: Critical Labels: flaky-test Fix For: 1.3.2, 1.4.0 {code} Expected exception java.lang.RuntimeException to be thrown, but no exception was thrown. {code} https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2201/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/neglects_Spark_and_Spark_s_dependencies/ ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7417) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies
[ https://issues.apache.org/jira/browse/SPARK-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-7417. Resolution: Fixed Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies --- Key: SPARK-7417 URL: https://issues.apache.org/jira/browse/SPARK-7417 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 1.4.0 Reporter: Andrew Or Assignee: Burak Yavuz Priority: Critical Labels: flaky-test Fix For: 1.3.2, 1.4.0 {code} Expected exception java.lang.RuntimeException to be thrown, but no exception was thrown. {code} https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2201/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/neglects_Spark_and_Spark_s_dependencies/ ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8095) Spark package dependencies not resolved when package is in local-ivy-cache
[ https://issues.apache.org/jira/browse/SPARK-8095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8095: --- Assignee: Apache Spark Spark package dependencies not resolved when package is in local-ivy-cache -- Key: SPARK-8095 URL: https://issues.apache.org/jira/browse/SPARK-8095 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.4.0 Reporter: Eron Wright Assignee: Apache Spark Given a dependency expressed with '--packages', the transitive dependencies are supposed to be automatically included. This is true for most repository types including local-m2-cache, Spark Packages, and central. For ivy-local-cache, it is not. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8117) Push codegen into Expression
[ https://issues.apache.org/jira/browse/SPARK-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8117: --- Assignee: Davies Liu (was: Apache Spark) Push codegen into Expression Key: SPARK-8117 URL: https://issues.apache.org/jira/browse/SPARK-8117 Project: Spark Issue Type: Bug Components: SQL Reporter: Davies Liu Assignee: Davies Liu Push the codegen implementation of expression into Expression itself, make it easy to manage and extend. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8109) TestSQLContext's static initialization is run during MiMa tests, causing SparkContexts to be created
Josh Rosen created SPARK-8109: - Summary: TestSQLContext's static initialization is run during MiMa tests, causing SparkContexts to be created Key: SPARK-8109 URL: https://issues.apache.org/jira/browse/SPARK-8109 Project: Spark Issue Type: Improvement Components: SQL, Tests Reporter: Josh Rosen Check out this stacktrace which occurred during MiMa tests in the pull request builder: {code} java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187) at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316) at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.eclipse.jetty.server.Server.doStart(Server.java:293) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:228) at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:238) at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:238) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982) at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:238) at org.apache.spark.ui.WebUI.bind(WebUI.scala:117) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:448) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:448) at scala.Option.foreach(Option.scala:236) at org.apache.spark.SparkContext.init(SparkContext.scala:448) at org.apache.spark.SparkContext.init(SparkContext.scala:135) at org.apache.spark.sql.test.LocalSQLContext.init(TestSQLContext.scala:29) at org.apache.spark.sql.test.TestSQLContext$.init(TestSQLContext.scala:55) at org.apache.spark.sql.test.TestSQLContext$.clinit(TestSQLContext.scala) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at scala.reflect.runtime.JavaMirrors$JavaMirror.javaClass(JavaMirrors.scala:500) at scala.reflect.runtime.JavaMirrors$JavaMirror.tryJavaClass(JavaMirrors.scala:505) at scala.reflect.runtime.SymbolLoaders$PackageScope.lookupEntry(SymbolLoaders.scala:109) at scala.reflect.internal.Types$Type.findMember(Types.scala:1185) at scala.reflect.internal.Types$Type.memberBasedOnName(Types.scala:722) at scala.reflect.internal.Types$Type.member(Types.scala:680) at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:43) at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61) at scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72) at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:161) at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:21) at org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:72) at org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:69) at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) at org.apache.spark.tools.GenerateMIMAIgnore$.privateWithin(GenerateMIMAIgnore.scala:69) at org.apache.spark.tools.GenerateMIMAIgnore$.main(GenerateMIMAIgnore.scala:126) at org.apache.spark.tools.GenerateMIMAIgnore.main(GenerateMIMAIgnore.scala) {code} Here, TestSQLContext's static initialization code is being run during MiMa checks and that initialization creates a SparkContext. Because MiMa doesn't run with our test system properties, the UI tries to bind to a contended port. This may lead to flakiness. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To
[jira] [Commented] (SPARK-8109) TestSQLContext's static initialization is run during MiMa tests, causing SparkContexts to be created
[ https://issues.apache.org/jira/browse/SPARK-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573554#comment-14573554 ] Josh Rosen commented on SPARK-8109: --- /cc [~rxin] TestSQLContext's static initialization is run during MiMa tests, causing SparkContexts to be created Key: SPARK-8109 URL: https://issues.apache.org/jira/browse/SPARK-8109 Project: Spark Issue Type: Improvement Components: SQL, Tests Reporter: Josh Rosen Check out this stacktrace which occurred during MiMa tests in the pull request builder: {code} java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187) at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316) at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.eclipse.jetty.server.Server.doStart(Server.java:293) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:228) at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:238) at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:238) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982) at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:238) at org.apache.spark.ui.WebUI.bind(WebUI.scala:117) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:448) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:448) at scala.Option.foreach(Option.scala:236) at org.apache.spark.SparkContext.init(SparkContext.scala:448) at org.apache.spark.SparkContext.init(SparkContext.scala:135) at org.apache.spark.sql.test.LocalSQLContext.init(TestSQLContext.scala:29) at org.apache.spark.sql.test.TestSQLContext$.init(TestSQLContext.scala:55) at org.apache.spark.sql.test.TestSQLContext$.clinit(TestSQLContext.scala) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at scala.reflect.runtime.JavaMirrors$JavaMirror.javaClass(JavaMirrors.scala:500) at scala.reflect.runtime.JavaMirrors$JavaMirror.tryJavaClass(JavaMirrors.scala:505) at scala.reflect.runtime.SymbolLoaders$PackageScope.lookupEntry(SymbolLoaders.scala:109) at scala.reflect.internal.Types$Type.findMember(Types.scala:1185) at scala.reflect.internal.Types$Type.memberBasedOnName(Types.scala:722) at scala.reflect.internal.Types$Type.member(Types.scala:680) at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:43) at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61) at scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72) at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:161) at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:21) at org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:72) at org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:69) at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) at org.apache.spark.tools.GenerateMIMAIgnore$.privateWithin(GenerateMIMAIgnore.scala:69) at org.apache.spark.tools.GenerateMIMAIgnore$.main(GenerateMIMAIgnore.scala:126) at org.apache.spark.tools.GenerateMIMAIgnore.main(GenerateMIMAIgnore.scala) {code} Here, TestSQLContext's static initialization code is being run during MiMa checks and that initialization creates a SparkContext. Because
[jira] [Updated] (SPARK-8110) DAG visualizations sometimes look weird in Python
[ https://issues.apache.org/jira/browse/SPARK-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-8110: - Attachment: Screen Shot 2015-06-04 at 1.51.32 PM.png Screen Shot 2015-06-04 at 1.51.35 PM.png DAG visualizations sometimes look weird in Python - Key: SPARK-8110 URL: https://issues.apache.org/jira/browse/SPARK-8110 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.4.0 Reporter: Matei Zaharia Priority: Minor Attachments: Screen Shot 2015-06-04 at 1.51.32 PM.png, Screen Shot 2015-06-04 at 1.51.35 PM.png Got this by doing sc.textFile(README.md).count() -- there are some RDDs outside of any stages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8115) Remove TestData
Reynold Xin created SPARK-8115: -- Summary: Remove TestData Key: SPARK-8115 URL: https://issues.apache.org/jira/browse/SPARK-8115 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Priority: Minor TestData was from the era when we didn't have easy ways to generate test datasets. Now we have implicits on Seq + toDF, it'd make more sense to put the test datasets closer to the test suites. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8112) Received block event count through the StreamingListener can be negative
[ https://issues.apache.org/jira/browse/SPARK-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8112: --- Assignee: Apache Spark (was: Shixiong Zhu) Received block event count through the StreamingListener can be negative Key: SPARK-8112 URL: https://issues.apache.org/jira/browse/SPARK-8112 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.4.0 Reporter: Tathagata Das Assignee: Apache Spark Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8112) Received block event count through the StreamingListener can be negative
[ https://issues.apache.org/jira/browse/SPARK-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8112: --- Assignee: Shixiong Zhu (was: Apache Spark) Received block event count through the StreamingListener can be negative Key: SPARK-8112 URL: https://issues.apache.org/jira/browse/SPARK-8112 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.4.0 Reporter: Tathagata Das Assignee: Shixiong Zhu Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8112) Received block event count through the StreamingListener can be negative
[ https://issues.apache.org/jira/browse/SPARK-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573796#comment-14573796 ] Apache Spark commented on SPARK-8112: - User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/6659 Received block event count through the StreamingListener can be negative Key: SPARK-8112 URL: https://issues.apache.org/jira/browse/SPARK-8112 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.4.0 Reporter: Tathagata Das Assignee: Shixiong Zhu Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7417) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies
[ https://issues.apache.org/jira/browse/SPARK-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-7417: - Target Version/s: 1.3.2, 1.4.0 (was: 1.4.0) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies --- Key: SPARK-7417 URL: https://issues.apache.org/jira/browse/SPARK-7417 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 1.4.0 Reporter: Andrew Or Priority: Critical Labels: flaky-test Fix For: 1.3.2, 1.4.0 {code} Expected exception java.lang.RuntimeException to be thrown, but no exception was thrown. {code} https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2201/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/neglects_Spark_and_Spark_s_dependencies/ ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7418) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts
[ https://issues.apache.org/jira/browse/SPARK-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-7418: - Target Version/s: 1.3.2, 1.4.0 (was: 1.4.0) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts --- Key: SPARK-7418 URL: https://issues.apache.org/jira/browse/SPARK-7418 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 1.4.0 Reporter: Andrew Or Assignee: Burak Yavuz Priority: Critical Fix For: 1.3.2, 1.4.0 {code} java.lang.RuntimeException: [unresolved dependency: com.agimatec#agimatec-validation;0.9.3: not found] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:931) at org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply$mcV$sp(SparkSubmitUtilsSuite.scala:108) at org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply(SparkSubmitUtilsSuite.scala:107) at {code} https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/2075/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/search_for_artifact_at_other_repositories/ ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7418) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts
[ https://issues.apache.org/jira/browse/SPARK-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-7418: - Fix Version/s: 1.4.0 1.3.2 Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts --- Key: SPARK-7418 URL: https://issues.apache.org/jira/browse/SPARK-7418 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 1.4.0 Reporter: Andrew Or Assignee: Burak Yavuz Priority: Critical Fix For: 1.3.2, 1.4.0 {code} java.lang.RuntimeException: [unresolved dependency: com.agimatec#agimatec-validation;0.9.3: not found] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:931) at org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply$mcV$sp(SparkSubmitUtilsSuite.scala:108) at org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply(SparkSubmitUtilsSuite.scala:107) at {code} https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/2075/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/search_for_artifact_at_other_repositories/ ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7417) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies
[ https://issues.apache.org/jira/browse/SPARK-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-7417: - Fix Version/s: 1.4.0 1.3.2 Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies --- Key: SPARK-7417 URL: https://issues.apache.org/jira/browse/SPARK-7417 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 1.4.0 Reporter: Andrew Or Priority: Critical Labels: flaky-test Fix For: 1.3.2, 1.4.0 {code} Expected exception java.lang.RuntimeException to be thrown, but no exception was thrown. {code} https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2201/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/neglects_Spark_and_Spark_s_dependencies/ ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7417) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies
[ https://issues.apache.org/jira/browse/SPARK-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573842#comment-14573842 ] Andrew Or commented on SPARK-7417: -- This should be resolved by: branch-1.4+: 8014e1f6bb871d9fd4db74106eb4425d0c1e9dd6 (#5892) branch-1.3: 5b96b6933a1c0f05512823117c8c66f4b44e2937 (#6657) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies --- Key: SPARK-7417 URL: https://issues.apache.org/jira/browse/SPARK-7417 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 1.4.0 Reporter: Andrew Or Assignee: Burak Yavuz Priority: Critical Labels: flaky-test Fix For: 1.3.2, 1.4.0 {code} Expected exception java.lang.RuntimeException to be thrown, but no exception was thrown. {code} https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2201/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/neglects_Spark_and_Spark_s_dependencies/ ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7418) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts
[ https://issues.apache.org/jira/browse/SPARK-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573840#comment-14573840 ] Andrew Or commented on SPARK-7418: -- This should be resolved by: branch-1.4+: 8014e1f6bb871d9fd4db74106eb4425d0c1e9dd6 (#5892) branch-1.3: 5b96b6933a1c0f05512823117c8c66f4b44e2937 (#6657) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts --- Key: SPARK-7418 URL: https://issues.apache.org/jira/browse/SPARK-7418 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 1.4.0 Reporter: Andrew Or Priority: Critical {code} java.lang.RuntimeException: [unresolved dependency: com.agimatec#agimatec-validation;0.9.3: not found] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:931) at org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply$mcV$sp(SparkSubmitUtilsSuite.scala:108) at org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply(SparkSubmitUtilsSuite.scala:107) at {code} https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/2075/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/search_for_artifact_at_other_repositories/ ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7418) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts
[ https://issues.apache.org/jira/browse/SPARK-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-7418. Resolution: Fixed Assignee: Burak Yavuz Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts --- Key: SPARK-7418 URL: https://issues.apache.org/jira/browse/SPARK-7418 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 1.4.0 Reporter: Andrew Or Assignee: Burak Yavuz Priority: Critical {code} java.lang.RuntimeException: [unresolved dependency: com.agimatec#agimatec-validation;0.9.3: not found] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:931) at org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply$mcV$sp(SparkSubmitUtilsSuite.scala:108) at org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply(SparkSubmitUtilsSuite.scala:107) at {code} https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/2075/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/search_for_artifact_at_other_repositories/ ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7546) Example code for ML Pipelines feature transformations
[ https://issues.apache.org/jira/browse/SPARK-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7546: --- Assignee: Apache Spark (was: Ram Sriharsha) Example code for ML Pipelines feature transformations - Key: SPARK-7546 URL: https://issues.apache.org/jira/browse/SPARK-7546 Project: Spark Issue Type: New Feature Components: ML Reporter: Joseph K. Bradley Assignee: Apache Spark This should be added for Scala, Java, and Python. It should cover ML Pipelines using a complex series of feature transformations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7546) Example code for ML Pipelines feature transformations
[ https://issues.apache.org/jira/browse/SPARK-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573625#comment-14573625 ] Apache Spark commented on SPARK-7546: - User 'harsha2010' has created a pull request for this issue: https://github.com/apache/spark/pull/6654 Example code for ML Pipelines feature transformations - Key: SPARK-7546 URL: https://issues.apache.org/jira/browse/SPARK-7546 Project: Spark Issue Type: New Feature Components: ML Reporter: Joseph K. Bradley Assignee: Ram Sriharsha This should be added for Scala, Java, and Python. It should cover ML Pipelines using a complex series of feature transformations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8113) SQL module test cleanup
Reynold Xin created SPARK-8113: -- Summary: SQL module test cleanup Key: SPARK-8113 URL: https://issues.apache.org/jira/browse/SPARK-8113 Project: Spark Issue Type: Umbrella Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Priority: Minor Some cleanup tasks to track here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-6419) GenerateOrdering does not support BinaryType and complex types.
[ https://issues.apache.org/jira/browse/SPARK-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-6419: - Assignee: Davies Liu GenerateOrdering does not support BinaryType and complex types. --- Key: SPARK-6419 URL: https://issues.apache.org/jira/browse/SPARK-6419 Project: Spark Issue Type: Bug Components: SQL Reporter: Yin Huai Assignee: Davies Liu When user want to order by binary columns or columns with complex types and code gen is enabled, there will be a MatchError ([see here|https://github.com/apache/spark/blob/v1.3.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala#L45]). We can either add supports for these types or have a function to check if we can safely call GenerateOrdering (like the canBeCodeGened for HashAggregation Strategy). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8117) Push codegen into Expression
Davies Liu created SPARK-8117: - Summary: Push codegen into Expression Key: SPARK-8117 URL: https://issues.apache.org/jira/browse/SPARK-8117 Project: Spark Issue Type: Bug Components: SQL Reporter: Davies Liu Assignee: Davies Liu Push the codegen implementation of expression into Expression itself, make it easy to manage and extend. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7184) Investigate turning codegen on by default
[ https://issues.apache.org/jira/browse/SPARK-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-7184: - Assignee: Davies Liu Investigate turning codegen on by default - Key: SPARK-7184 URL: https://issues.apache.org/jira/browse/SPARK-7184 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Davies Liu If it is not the default, users get suboptimal performance out of the box, and the codegen path falls behind the interpreted path over time. The best option might be to have only the codegen path. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7536) Audit MLlib Python API for 1.4
[ https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573535#comment-14573535 ] Yanbo Liang commented on SPARK-7536: [~josephkb] I'm in business travel during 1st June to 10th June, so there will be no update during this period. Audit MLlib Python API for 1.4 -- Key: SPARK-7536 URL: https://issues.apache.org/jira/browse/SPARK-7536 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Joseph K. Bradley Assignee: Yanbo Liang For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala Python versions. We need to track: * Inconsistency: Do class/method/parameter names match? SPARK-7667 * Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc. SPARK-7666 * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release. SPARK-7665 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well. * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python. ** classification *** StreamingLogisticRegressionWithSGD SPARK-7633 ** clustering *** GaussianMixture SPARK-6258 *** LDA SPARK-6259 *** Power Iteration Clustering SPARK-5962 *** StreamingKMeans SPARK-4118 ** evaluation *** MultilabelMetrics SPARK-6094 ** feature *** ElementwiseProduct SPARK-7605 *** PCA SPARK-7604 ** linalg *** Distributed linear algebra SPARK-6100 ** pmml.export SPARK-7638 ** regression *** StreamingLinearRegressionWithSGD SPARK-4127 ** stat *** KernelDensity SPARK-7639 ** util *** MLUtils SPARK-6263 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7536) Audit MLlib Python API for 1.4
[ https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573537#comment-14573537 ] Yanbo Liang commented on SPARK-7536: [~josephkb] I'm in business travel during 1st June to 10th June, so there will be no update during this period. Audit MLlib Python API for 1.4 -- Key: SPARK-7536 URL: https://issues.apache.org/jira/browse/SPARK-7536 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Joseph K. Bradley Assignee: Yanbo Liang For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala Python versions. We need to track: * Inconsistency: Do class/method/parameter names match? SPARK-7667 * Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc. SPARK-7666 * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release. SPARK-7665 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well. * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python. ** classification *** StreamingLogisticRegressionWithSGD SPARK-7633 ** clustering *** GaussianMixture SPARK-6258 *** LDA SPARK-6259 *** Power Iteration Clustering SPARK-5962 *** StreamingKMeans SPARK-4118 ** evaluation *** MultilabelMetrics SPARK-6094 ** feature *** ElementwiseProduct SPARK-7605 *** PCA SPARK-7604 ** linalg *** Distributed linear algebra SPARK-6100 ** pmml.export SPARK-7638 ** regression *** StreamingLinearRegressionWithSGD SPARK-4127 ** stat *** KernelDensity SPARK-7639 ** util *** MLUtils SPARK-6263 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8080) Custom Receiver.store with Iterator type do not give correct count at Spark UI
[ https://issues.apache.org/jira/browse/SPARK-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573566#comment-14573566 ] Tathagata Das commented on SPARK-8080: -- [~zsxwing] Take a look at the screenshot attached to the JIRA. We should not be showing negative numbers in the input size. I am guessing that this is happening because the num of records reported by ReceivedBlockInfo is -1 (to signify lack of information), which gets added up to become -4). This should not happen I am filing a separate JIRA for this, can you take a look at the issue? Custom Receiver.store with Iterator type do not give correct count at Spark UI -- Key: SPARK-8080 URL: https://issues.apache.org/jira/browse/SPARK-8080 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.0 Reporter: Dibyendu Bhattacharya Fix For: 1.4.0 Attachments: screenshot.png In Custom receiver if I call store with Iterator type (store(dataIterator: Iterator[T]): Unit ) , Spark UI does not show the correct count of records in block which leads to wrong value for Input Rate, Scheduling Delay and Input SIze. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8068) Add confusionMatrix method at class MulticlassMetrics in pyspark/mllib
[ https://issues.apache.org/jira/browse/SPARK-8068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573676#comment-14573676 ] Ai He commented on SPARK-8068: -- Hi, Joseph, am I supposed to solve this issue or just let the assignee of SPARK-7536 resolve all related issues? Add confusionMatrix method at class MulticlassMetrics in pyspark/mllib -- Key: SPARK-8068 URL: https://issues.apache.org/jira/browse/SPARK-8068 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 1.3.1 Reporter: Ai He Priority: Minor There is no confusionMatrix method at class MulticlassMetrics in pyspark/mllib. This method is actually implemented in scala mllib. To achieve this, we just need add a function call to the corresponding one in scala mllib. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8098) Show correct length of bytes on log page
[ https://issues.apache.org/jira/browse/SPARK-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-8098. -- Resolution: Fixed Fix Version/s: 1.5.0 1.4.1 1.3.2 Show correct length of bytes on log page Key: SPARK-8098 URL: https://issues.apache.org/jira/browse/SPARK-8098 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.3.1 Reporter: Carson Wang Priority: Minor Fix For: 1.3.2, 1.4.1, 1.5.0 The log page should only show desired length of bytes. Currently it shows bytes from the startIndex to the end of the file. The Next button on the page is always disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8110) DAG visualizations sometimes look weird in Python
Matei Zaharia created SPARK-8110: Summary: DAG visualizations sometimes look weird in Python Key: SPARK-8110 URL: https://issues.apache.org/jira/browse/SPARK-8110 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.4.0 Reporter: Matei Zaharia Priority: Minor Got this by doing sc.textFile(README.md).count() -- there are some RDDs outside of any stages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8112) Received block event count through the StreamingListener can be negative
[ https://issues.apache.org/jira/browse/SPARK-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-8112: - Priority: Minor (was: Major) Received block event count through the StreamingListener can be negative Key: SPARK-8112 URL: https://issues.apache.org/jira/browse/SPARK-8112 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.4.0 Reporter: Tathagata Das Assignee: Shixiong Zhu Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org