date:20150604


 [ 
https://issues.apache.org/jira/browse/SPARK-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-5479:
-
Component/s: YARN

 PySpark on yarn mode need to support non-local python files
 ---

 Key: SPARK-5479
 URL: https://issues.apache.org/jira/browse/SPARK-5479
 Project: Spark
  Issue Type: Bug
  Components: PySpark, YARN
Affects Versions: 1.4.0
Reporter: Lianhui Wang

  In SPARK-5162 [~vgrigor] reports this:
 Now following code cannot work:
 aws emr add-steps --cluster-id j-XYWIXMD234 \
 --steps 
 Name=SparkPi,Jar=s3://eu-west-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[/home/hadoop/spark/bin/spark-submit,--deploy-mode,cluster,--master,yarn-cluster,--py-files,s3://mybucketat.amazonaws.com/tasks/main.py,main.py,param1],ActionOnFailure=CONTINUE
 so we need to support non-local python files on yarn client and cluster mode.
 before submitting application to Yarn, we need to download non-local files to 
 local or hdfs path.
 or spark.yarn.dist.files need to support other non-local files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8114) Remove wildcard import on TestSQLContext._


 [ 
https://issues.apache.org/jira/browse/SPARK-8114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8114:
---

Assignee: Apache Spark

 Remove wildcard import on TestSQLContext._
 --

 Key: SPARK-8114
 URL: https://issues.apache.org/jira/browse/SPARK-8114
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Apache Spark

 We import TestSQLContext._ in almost all test suites. This import introduces 
 a lot of methods and should be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8114) Remove wildcard import on TestSQLContext._


[ 
https://issues.apache.org/jira/browse/SPARK-8114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573829#comment-14573829
 ] 

Apache Spark commented on SPARK-8114:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/6661

 Remove wildcard import on TestSQLContext._
 --

 Key: SPARK-8114
 URL: https://issues.apache.org/jira/browse/SPARK-8114
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 We import TestSQLContext._ in almost all test suites. This import introduces 
 a lot of methods and should be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8071) Run PySpark dataframe.rollup/cube test failed


[ 
https://issues.apache.org/jira/browse/SPARK-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573828#comment-14573828
 ] 

Reynold Xin commented on SPARK-8071:


[~chenghao] can you take a look at this? Cube is not supposed to appear in the 
physical planner.

 Run PySpark dataframe.rollup/cube test failed
 -

 Key: SPARK-8071
 URL: https://issues.apache.org/jira/browse/SPARK-8071
 Project: Spark
  Issue Type: Bug
  Components: PySpark
 Environment: OS: SUSE 11 SP3; JDK: 1.8.0_40; Python: 2.6.8; Hadoop: 
 2.7.0; Spark: master branch
Reporter: Weizhong
Priority: Minor

 I run test for Spark, and failed on PySpark, details are:
 {code}
 File /xxx/Spark/python/pyspark/sql/dataframe.py, line 837, in 
 pyspark.sql.dataframe.DataFrame.cube
 Failed example:
 * df.cube('name', df.age).count().show()
 Exception raised:
 * Traceback (most recent call last):
 ** File /usr/lib64/python2.6/doctest.py, line 1253, in __run
 *** compileflags, 1) in test.globs
 ** File doctest pyspark.sql.dataframe.DataFrame.cube\[0], line 1, in 
 module
 *** df.cube('name', df.age).count().show()
 ** File /xxx/Spark/python/pyspark/sql/dataframe.py, line 291, in show
 *** print(self._jdf.showString\(n))
 ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, 
 line 538, in \_\_call\_\_
 *** self.target_id, self.name)
 ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 
 300, in get_return_value
 *** format(target_id, '.', name), value)
 * Py4JJavaError: An error occurred while calling o212.showString.
 * : java.lang.AssertionError: assertion failed: No plan for Cube 
 [name#1,age#0], [name#1,age#0,COUNT(1) AS count#27L], grouping__id#28
 ** LogicalRDD [age#0,name#1], MapPartitionsRDD\[7] at applySchemaToPythonRDD 
 at NativeMethodAccessorImpl.java:-2
 *** at scala.Predef$.assert(Predef.scala:179)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
 *** at 
 org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:312)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 *** at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:913)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:911)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:917)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:917)
 *** at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1255)
 *** at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1189)
 *** at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1248)
 *** at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:176)
 *** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 *** at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 *** at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 *** at java.lang.reflect.Method.invoke(Method.java:606)
 *** at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
 *** at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
 *** at py4j.Gateway.invoke(Gateway.java:259)
 *** at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
 *** at py4j.commands.CallCommand.execute(CallCommand.java:79)
 *** at py4j.GatewayConnection.run(GatewayConnection.java:207)
 *** at java.lang.Thread.run(Thread.java:745)
 **
1 of   1 in pyspark.sql.dataframe.DataFrame.cube
1 of   1 in pyspark.sql.dataframe.DataFrame.rollup
 ***Test Failed*** 2 failures.
 {code}
 cc [~davies]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8114) Remove wildcard import on TestSQLContext._


 [ 
https://issues.apache.org/jira/browse/SPARK-8114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8114:
---

Assignee: (was: Apache Spark)

 Remove wildcard import on TestSQLContext._
 --

 Key: SPARK-8114
 URL: https://issues.apache.org/jira/browse/SPARK-8114
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 We import TestSQLContext._ in almost all test suites. This import introduces 
 a lot of methods and should be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-7119) ScriptTransform doesn't consider the output data type

2015-06-04 Thread zhichao-li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhichao-li updated SPARK-7119:
--
Comment: was deleted

(was: This workaround query can be executed correctly and there's a simple fix 
for this issue by the way :))

 ScriptTransform doesn't consider the output data type
 -

 Key: SPARK-7119
 URL: https://issues.apache.org/jira/browse/SPARK-7119
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0, 1.3.1, 1.4.0
Reporter: Cheng Hao

 {code:sql}
 from (from src select transform(key, value) using 'cat' as (thing1 int, 
 thing2 string)) t select thing1 + 2;
 {code}
 {noformat}
 15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job 
 aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent 
 failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): 
 java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be 
 cast to java.lang.Integer
   at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
   at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57)
   at 
 org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)
   at 
 org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118)
   at 
 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)
   at 
 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
   at 
 scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
   at 
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
   at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
   at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
   at 
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
   at 
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8096) how to convert dataframe field to label and features

2015-06-04 Thread bofei.xiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bofei.xiao updated SPARK-8096:
--
Description: 
how to convert the dataframe to RDD[LabelPoint]
dataframe with fields target,age,sex,height
i want to cast target as label,age,sex,height as features vector

I faced this problem in the following circumstance:
--
given i have a csv file data.csv
target,age,sex,height
1,18,1,170
0,25,1,165
.

now,i want build a decisitin model
step 1:load csv data as dataframe
val data= sqlContext.load(com.databricks.spark.csv,:Map(path - data.csv, 
header - true)

step 2:build a decisiontree model
but decisiontree need a RDD[LabelPoint] input

thanks!

  was:
given i have a csv file data.csv
target,age,sex,height
1,18,1,170
0,25,1,165
.

now,i want build a decisitin model
step 1:load csv data as dataframe
val data= sqlContext.load(com.databricks.spark.csv,:Map(path - data.csv, 
header - true)

step 2:build a decisiontree model
but decisiontree need a RDD[LabelPoint] input

Q:how to convert the dataframe to RDD[LabelPoint]

thanks!

Summary: how to convert dataframe field to label and features  (was: 
use csv data to build a classification model,how to convert dataframe field to 
label and features)

 how to convert dataframe field to label and features
 

 Key: SPARK-8096
 URL: https://issues.apache.org/jira/browse/SPARK-8096
 Project: Spark
  Issue Type: Bug
Reporter: bofei.xiao

 how to convert the dataframe to RDD[LabelPoint]
 dataframe with fields target,age,sex,height
 i want to cast target as label,age,sex,height as features vector
 I faced this problem in the following circumstance:
 --
 given i have a csv file data.csv
 target,age,sex,height
 1,18,1,170
 0,25,1,165
 .
 now,i want build a decisitin model
 step 1:load csv data as dataframe
 val data= sqlContext.load(com.databricks.spark.csv,:Map(path - 
 data.csv, header - true)
 step 2:build a decisiontree model
 but decisiontree need a RDD[LabelPoint] input
 thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8096) how to convert dataframe field to LabelPoint

2015-06-04 Thread bofei.xiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bofei.xiao updated SPARK-8096:
--
Summary: how to convert dataframe field to LabelPoint  (was: how to convert 
dataframe field to label and features)

 how to convert dataframe field to LabelPoint
 

 Key: SPARK-8096
 URL: https://issues.apache.org/jira/browse/SPARK-8096
 Project: Spark
  Issue Type: Bug
Reporter: bofei.xiao

 how to convert the dataframe to RDD[LabelPoint]
 dataframe with fields target,age,sex,height
 i want to cast target as label,age,sex,height as features vector
 I faced this problem in the following circumstance:
 --
 given i have a csv file data.csv
 target,age,sex,height
 1,18,1,170
 0,25,1,165
 .
 now,i want build a decisitin model
 step 1:load csv data as dataframe
 val data= sqlContext.load(com.databricks.spark.csv,:Map(path - 
 data.csv, header - true)
 step 2:build a decisiontree model
 but decisiontree need a RDD[LabelPoint] input
 thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8071) Run PySpark dataframe.rollup/cube test failed

2015-06-04 Thread Cheng Hao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573952#comment-14573952
 ] 

Cheng Hao commented on SPARK-8071:
--

I couldn't reproduce that with scala API, and also it seems the failure code is 
from the PySpark unit test (for cube), [~davies] can you reproduce that 
exception?
JDK / Python version issues?

 Run PySpark dataframe.rollup/cube test failed
 -

 Key: SPARK-8071
 URL: https://issues.apache.org/jira/browse/SPARK-8071
 Project: Spark
  Issue Type: Bug
  Components: PySpark
 Environment: OS: SUSE 11 SP3; JDK: 1.8.0_40; Python: 2.6.8; Hadoop: 
 2.7.0; Spark: master branch
Reporter: Weizhong
Priority: Minor

 I run test for Spark, and failed on PySpark, details are:
 {code}
 File /xxx/Spark/python/pyspark/sql/dataframe.py, line 837, in 
 pyspark.sql.dataframe.DataFrame.cube
 Failed example:
 * df.cube('name', df.age).count().show()
 Exception raised:
 * Traceback (most recent call last):
 ** File /usr/lib64/python2.6/doctest.py, line 1253, in __run
 *** compileflags, 1) in test.globs
 ** File doctest pyspark.sql.dataframe.DataFrame.cube\[0], line 1, in 
 module
 *** df.cube('name', df.age).count().show()
 ** File /xxx/Spark/python/pyspark/sql/dataframe.py, line 291, in show
 *** print(self._jdf.showString\(n))
 ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, 
 line 538, in \_\_call\_\_
 *** self.target_id, self.name)
 ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 
 300, in get_return_value
 *** format(target_id, '.', name), value)
 * Py4JJavaError: An error occurred while calling o212.showString.
 * : java.lang.AssertionError: assertion failed: No plan for Cube 
 [name#1,age#0], [name#1,age#0,COUNT(1) AS count#27L], grouping__id#28
 ** LogicalRDD [age#0,name#1], MapPartitionsRDD\[7] at applySchemaToPythonRDD 
 at NativeMethodAccessorImpl.java:-2
 *** at scala.Predef$.assert(Predef.scala:179)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
 *** at 
 org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:312)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 *** at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:913)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:911)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:917)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:917)
 *** at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1255)
 *** at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1189)
 *** at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1248)
 *** at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:176)
 *** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 *** at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 *** at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 *** at java.lang.reflect.Method.invoke(Method.java:606)
 *** at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
 *** at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
 *** at py4j.Gateway.invoke(Gateway.java:259)
 *** at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
 *** at py4j.commands.CallCommand.execute(CallCommand.java:79)
 *** at py4j.GatewayConnection.run(GatewayConnection.java:207)
 *** at java.lang.Thread.run(Thread.java:745)
 **
1 of   1 in pyspark.sql.dataframe.DataFrame.cube
1 of   1 in pyspark.sql.dataframe.DataFrame.rollup
 ***Test Failed*** 2 failures.
 {code}
 cc [~davies]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8118) Turn off noisy log output produced by Parquet 1.7.0


 [ 
https://issues.apache.org/jira/browse/SPARK-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-8118:
---
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-5463

 Turn off noisy log output produced by Parquet 1.7.0
 ---

 Key: SPARK-8118
 URL: https://issues.apache.org/jira/browse/SPARK-8118
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.4.1, 1.5.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Minor

 Parquet 1.7.0 renames package name to org.apache.parquet, need to adjust 
 {{ParquetRelation.enableLogForwarding}} accordingly to avoid noisy log output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7819) Isolated Hive Client Loader appears to cause Native Library libMapRClient.4.0.2-mapr.so already loaded in another classloader error

2015-06-04 Thread Nathan McCarthy (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573929#comment-14573929
 ] 

Nathan McCarthy commented on SPARK-7819:


@Yin - looks like might build was just a little out of date! RC4 is running 
well! Thanks!

 Isolated Hive Client Loader appears to cause Native Library 
 libMapRClient.4.0.2-mapr.so already loaded in another classloader error
 ---

 Key: SPARK-7819
 URL: https://issues.apache.org/jira/browse/SPARK-7819
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Fi
Priority: Critical
 Attachments: invalidClassException.log, stacktrace.txt, test.py


 In reference to the pull request: https://github.com/apache/spark/pull/5876
 I have been running the Spark 1.3 branch for some time with no major hiccups, 
 and recently switched to the Spark 1.4 branch.
 I build my spark distribution with the following build command:
 {noformat}
 make-distribution.sh --tgz --skip-java-test --with-tachyon -Phive 
 -Phive-0.13.1 -Pmapr4 -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver
 {noformat}
 When running a python script containing a series of smoke tests I use to 
 validate the build, I encountered an error under the following conditions:
 * start a spark context
 * start a hive context
 * run any hive query
 * stop the spark context
 * start a second spark context
 * run any hive query
 ** ERROR
 From what I can tell, the Isolated Class Loader is hitting a MapR class that 
 is loading its native library (presumedly as part of a static initializer).
 Unfortunately, the JVM prohibits this the second time around.
 I would think that shutting down the SparkContext would clear out any 
 vestigials of the JVM, so I'm surprised that this would even be a problem.
 Note: all other smoke tests we are running passes fine.
 I will attach the stacktrace and a python script reproducing the issue (at 
 least for my environment and build).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7819) Isolated Hive Client Loader appears to cause Native Library libMapRClient.4.0.2-mapr.so already loaded in another classloader error

2015-06-04 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573944#comment-14573944
 ] 

Yin Huai commented on SPARK-7819:
-

[~nemccarthy] Thank you for the update! Glad to hear that :)

 Isolated Hive Client Loader appears to cause Native Library 
 libMapRClient.4.0.2-mapr.so already loaded in another classloader error
 ---

 Key: SPARK-7819
 URL: https://issues.apache.org/jira/browse/SPARK-7819
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Fi
Priority: Critical
 Attachments: invalidClassException.log, stacktrace.txt, test.py


 In reference to the pull request: https://github.com/apache/spark/pull/5876
 I have been running the Spark 1.3 branch for some time with no major hiccups, 
 and recently switched to the Spark 1.4 branch.
 I build my spark distribution with the following build command:
 {noformat}
 make-distribution.sh --tgz --skip-java-test --with-tachyon -Phive 
 -Phive-0.13.1 -Pmapr4 -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver
 {noformat}
 When running a python script containing a series of smoke tests I use to 
 validate the build, I encountered an error under the following conditions:
 * start a spark context
 * start a hive context
 * run any hive query
 * stop the spark context
 * start a second spark context
 * run any hive query
 ** ERROR
 From what I can tell, the Isolated Class Loader is hitting a MapR class that 
 is loading its native library (presumedly as part of a static initializer).
 Unfortunately, the JVM prohibits this the second time around.
 I would think that shutting down the SparkContext would clear out any 
 vestigials of the JVM, so I'm surprised that this would even be a problem.
 Note: all other smoke tests we are running passes fine.
 I will attach the stacktrace and a python script reproducing the issue (at 
 least for my environment and build).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8119) Spark will set total executor when some executors fail.


[ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573946#comment-14573946
 ] 

Apache Spark commented on SPARK-8119:
-

User 'SaintBacchus' has created a pull request for this issue:
https://github.com/apache/spark/pull/6662

 Spark will set total executor when some executors fail.
 ---

 Key: SPARK-8119
 URL: https://issues.apache.org/jira/browse/SPARK-8119
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.4.0
Reporter: SaintBacchus
 Fix For: 1.4.0


 DynamicAllocation will set the total executor to a little number when it 
 wants to kill some executors.
 But in no-DynamicAllocation scenario, Spark will also set the total executor.
 So it will cause such problem: sometimes an executor fails down, there is no 
 more executor which will be pull up by spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8056) Design an easier way to construct schema for both Scala and Python


 [ 
https://issues.apache.org/jira/browse/SPARK-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-8056:
---
Assignee: (was: Reynold Xin)

 Design an easier way to construct schema for both Scala and Python
 --

 Key: SPARK-8056
 URL: https://issues.apache.org/jira/browse/SPARK-8056
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 StructType is fairly hard to construct, especially in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8056) Design an easier way to construct schema for both Scala and Python


[ 
https://issues.apache.org/jira/browse/SPARK-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573908#comment-14573908
 ] 

Reynold Xin commented on SPARK-8056:


I'm not actively working on this. Feel free to take over. If you have this 
early enough, we can even put it into 1.4.1.

I like that idea. I think we should have the 2nd argument of the add support 
both a string for simple types, as well as a DataType object.


 Design an easier way to construct schema for both Scala and Python
 --

 Key: SPARK-8056
 URL: https://issues.apache.org/jira/browse/SPARK-8056
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 StructType is fairly hard to construct, especially in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8116) sc.range() doesn't match python range()


 [ 
https://issues.apache.org/jira/browse/SPARK-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-8116:
---
Target Version/s: 1.4.1  (was: 1.4.0, 1.4.1)

 sc.range() doesn't match python range()
 ---

 Key: SPARK-8116
 URL: https://issues.apache.org/jira/browse/SPARK-8116
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.4.0, 1.4.1
Reporter: Ted Blackman
Priority: Minor
  Labels: easyfix

 Python's built-in range() and xrange() functions can take 1, 2, or 3 
 arguments. Ranges with just 1 argument are probably used the most frequently, 
 e.g.:
 for i in range(len(myList)): ...
 However, in pyspark, the SparkContext range() method throws an error when 
 called with a single argument, due to the way its arguments get passed into 
 python's range function.
 There's no good reason that I can think of not to support the same syntax as 
 the built-in function. To fix this, we can set the default of the sc.range() 
 method's `stop` argument to None, and then inside the method, if it is None, 
 replace `stop` with `start` and set `start` to 0, which is what the c 
 implementation of range() does:
 https://github.com/python/cpython/blob/master/Objects/rangeobject.c#L87



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8122) A few problems in ParquetRelation.enableLogForwarding()

Konstantin Shaposhnikov created SPARK-8122:
--

 Summary: A few problems in ParquetRelation.enableLogForwarding()
 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov


_enableLogForwarding()_ should be updated after parquet 1.7.0 update , because 
name of the logger has been changed to `org.apache.parquet`. From parquet-mr 
Log:

{code}
// add a default handler in case there is none
Logger logger = Logger.getLogger(Log.class.getPackage().getName());
{code}

Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
created loggers that can be garbage collected and all configuration changes 
will be gone. From 
https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
javadocs:  _It is important to note that the Logger returned by one of the 
getLogger factory methods may be garbage collected at any time if a strong 
reference to the Logger is not kept._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8106) Set derby.system.durability=test in order to speed up Hive compatibility tests


 [ 
https://issues.apache.org/jira/browse/SPARK-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-8106:
---
Component/s: Tests
 Build

 Set derby.system.durability=test in order to speed up Hive compatibility tests
 --

 Key: SPARK-8106
 URL: https://issues.apache.org/jira/browse/SPARK-8106
 Project: Spark
  Issue Type: Improvement
  Components: Build, SQL, Tests
Reporter: Josh Rosen
Assignee: Josh Rosen
 Fix For: 1.5.0


 Derby has a configuration property named {{derby.system.durability}} that 
 disables I/O synchronization calls for many writes.  This sacrifices 
 durability but can result in large performance gains, which is appropriate 
 for tests.
 We should enable this in our test system properties in order to speed up the 
 Hive compatibility tests.  I saw 2-3x speedups locally with this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8056) Design an easier way to construct schema for both Scala and Python

2015-06-04 Thread Ilya Ganelin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573874#comment-14573874
 ] 

Ilya Ganelin commented on SPARK-8056:
-

[~rxin] Are you actively working on this? I think this could be readily solved 
by providing interface to construct StructType the way we construct SparkConf, 
e.g.
new StructType().add(f1,v1).add(f1,v2) etc

 Design an easier way to construct schema for both Scala and Python
 --

 Key: SPARK-8056
 URL: https://issues.apache.org/jira/browse/SPARK-8056
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin

 StructType is fairly hard to construct, especially in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8106) Set derby.system.durability=test in order to speed up Hive compatibility tests


 [ 
https://issues.apache.org/jira/browse/SPARK-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-8106.

   Resolution: Fixed
Fix Version/s: 1.5.0

 Set derby.system.durability=test in order to speed up Hive compatibility tests
 --

 Key: SPARK-8106
 URL: https://issues.apache.org/jira/browse/SPARK-8106
 Project: Spark
  Issue Type: Improvement
  Components: Build, SQL, Tests
Reporter: Josh Rosen
Assignee: Josh Rosen
 Fix For: 1.5.0


 Derby has a configuration property named {{derby.system.durability}} that 
 disables I/O synchronization calls for many writes.  This sacrifices 
 durability but can result in large performance gains, which is appropriate 
 for tests.
 We should enable this in our test system properties in order to speed up the 
 Hive compatibility tests.  I saw 2-3x speedups locally with this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7119) ScriptTransform doesn't consider the output data type

2015-06-04 Thread zhichao-li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573889#comment-14573889
 ] 

zhichao-li commented on SPARK-7119:
---

This workaround query can be executed correctly and there's a simple fix for 
this issue by the way :)

 ScriptTransform doesn't consider the output data type
 -

 Key: SPARK-7119
 URL: https://issues.apache.org/jira/browse/SPARK-7119
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0, 1.3.1, 1.4.0
Reporter: Cheng Hao

 {code:sql}
 from (from src select transform(key, value) using 'cat' as (thing1 int, 
 thing2 string)) t select thing1 + 2;
 {code}
 {noformat}
 15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job 
 aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent 
 failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): 
 java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be 
 cast to java.lang.Integer
   at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
   at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57)
   at 
 org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)
   at 
 org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118)
   at 
 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)
   at 
 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
   at 
 scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
   at 
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
   at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
   at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
   at 
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
   at 
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7119) ScriptTransform doesn't consider the output data type

2015-06-04 Thread zhichao-li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573888#comment-14573888
 ] 

zhichao-li commented on SPARK-7119:
---

This workaround query can be executed correctly and there's a simple fix for 
this issue by the way :)

 ScriptTransform doesn't consider the output data type
 -

 Key: SPARK-7119
 URL: https://issues.apache.org/jira/browse/SPARK-7119
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0, 1.3.1, 1.4.0
Reporter: Cheng Hao

 {code:sql}
 from (from src select transform(key, value) using 'cat' as (thing1 int, 
 thing2 string)) t select thing1 + 2;
 {code}
 {noformat}
 15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job 
 aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent 
 failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): 
 java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be 
 cast to java.lang.Integer
   at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
   at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57)
   at 
 org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)
   at 
 org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118)
   at 
 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)
   at 
 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
   at 
 scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
   at 
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
   at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
   at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
   at 
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
   at 
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8119) Spark will set total executor when some executors fail.

2015-06-04 Thread SaintBacchus (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

SaintBacchus updated SPARK-8119:

Description:
DynamicAllocation will set the total executor to a little number when it wants
to kill some executors.
But in no-DynamicAllocation scenario, Spark will also set the total executor.
So it will cause such problem: sometimes an executor fails down, there is no
more executor which will be pull up by spark.

was:
DynamicAllocation will set the total executor to a little number when it wants
to kill some executors.
But in no-DynamicAllocation scenario, Spark will also set the total executor.
So it will cause thus problem: sometimes an executor fails down, there is no
more executor which will be pull up by spark.

Spark will set total executor when some executors fail.
---

Key: SPARK-8119
URL: https://issues.apache.org/jira/browse/SPARK-8119
Project: Spark
Issue Type: Bug
Components: Scheduler
Affects Versions: 1.4.0
Reporter: SaintBacchus
Fix For: 1.4.0

DynamicAllocation will set the total executor to a little number when it
wants to kill some executors.
But in no-DynamicAllocation scenario, Spark will also set the total executor.
So it will cause such problem: sometimes an executor fails down, there is no
more executor which will be pull up by spark.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-8096) how to convert dataframe field to LabelPoint

2015-06-04 Thread bofei.xiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bofei.xiao reopened SPARK-8096:
---

I'm sorry i haven't express my question clearly!

 how to convert dataframe field to LabelPoint
 

 Key: SPARK-8096
 URL: https://issues.apache.org/jira/browse/SPARK-8096
 Project: Spark
  Issue Type: Bug
Reporter: bofei.xiao

 how to convert the dataframe to RDD[LabelPoint]
 dataframe with fields target,age,sex,height
 i want to cast target as label,age,sex,height as features vector
 I faced this problem in the following circumstance:
 --
 given i have a csv file data.csv
 target,age,sex,height
 1,18,1,170
 0,25,1,165
 .
 now,i want build a decisitin model
 step 1:load csv data as dataframe
 val data= sqlContext.load(com.databricks.spark.csv,:Map(path - 
 data.csv, header - true)
 step 2:build a decisiontree model
 but decisiontree need a RDD[LabelPoint] input
 thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8120) Typos in warning message in sql/types.py

2015-06-04 Thread Joseph K. Bradley (JIRA)

Joseph K. Bradley created SPARK-8120:


 Summary: Typos in warning message in sql/types.py
 Key: SPARK-8120
 URL: https://issues.apache.org/jira/browse/SPARK-8120
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 1.4.0
Reporter: Joseph K. Bradley
Priority: Trivial


See 
[https://github.com/apache/spark/blob/3ba6fc515d6ea45c281bb81f648a38523be06383/python/pyspark/sql/types.py#L1093]

Need to fix string concat + use of %



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8071) Run PySpark dataframe.rollup/cube test failed

2015-06-04 Thread Cheng Hao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573935#comment-14573935
 ] 

Cheng Hao commented on SPARK-8071:
--

Can you try `df.cube('name', 'age').count().show()`?

 Run PySpark dataframe.rollup/cube test failed
 -

 Key: SPARK-8071
 URL: https://issues.apache.org/jira/browse/SPARK-8071
 Project: Spark
  Issue Type: Bug
  Components: PySpark
 Environment: OS: SUSE 11 SP3; JDK: 1.8.0_40; Python: 2.6.8; Hadoop: 
 2.7.0; Spark: master branch
Reporter: Weizhong
Priority: Minor

 I run test for Spark, and failed on PySpark, details are:
 {code}
 File /xxx/Spark/python/pyspark/sql/dataframe.py, line 837, in 
 pyspark.sql.dataframe.DataFrame.cube
 Failed example:
 * df.cube('name', df.age).count().show()
 Exception raised:
 * Traceback (most recent call last):
 ** File /usr/lib64/python2.6/doctest.py, line 1253, in __run
 *** compileflags, 1) in test.globs
 ** File doctest pyspark.sql.dataframe.DataFrame.cube\[0], line 1, in 
 module
 *** df.cube('name', df.age).count().show()
 ** File /xxx/Spark/python/pyspark/sql/dataframe.py, line 291, in show
 *** print(self._jdf.showString\(n))
 ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, 
 line 538, in \_\_call\_\_
 *** self.target_id, self.name)
 ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 
 300, in get_return_value
 *** format(target_id, '.', name), value)
 * Py4JJavaError: An error occurred while calling o212.showString.
 * : java.lang.AssertionError: assertion failed: No plan for Cube 
 [name#1,age#0], [name#1,age#0,COUNT(1) AS count#27L], grouping__id#28
 ** LogicalRDD [age#0,name#1], MapPartitionsRDD\[7] at applySchemaToPythonRDD 
 at NativeMethodAccessorImpl.java:-2
 *** at scala.Predef$.assert(Predef.scala:179)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
 *** at 
 org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:312)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 *** at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:913)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:911)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:917)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:917)
 *** at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1255)
 *** at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1189)
 *** at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1248)
 *** at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:176)
 *** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 *** at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 *** at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 *** at java.lang.reflect.Method.invoke(Method.java:606)
 *** at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
 *** at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
 *** at py4j.Gateway.invoke(Gateway.java:259)
 *** at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
 *** at py4j.commands.CallCommand.execute(CallCommand.java:79)
 *** at py4j.GatewayConnection.run(GatewayConnection.java:207)
 *** at java.lang.Thread.run(Thread.java:745)
 **
1 of   1 in pyspark.sql.dataframe.DataFrame.cube
1 of   1 in pyspark.sql.dataframe.DataFrame.rollup
 ***Test Failed*** 2 failures.
 {code}
 cc [~davies]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8121) spark.sql.parquet.output.committer.class is overriden by spark.sql.sources.outputCommitterClass

2015-06-04 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-8121:
-

 Summary: spark.sql.parquet.output.committer.class is overriden 
by spark.sql.sources.outputCommitterClass
 Key: SPARK-8121
 URL: https://issues.apache.org/jira/browse/SPARK-8121
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Cheng Lian
Assignee: Cheng Lian


When spark.sql.sources.outputCommitterClass is configured, 
spark.sql.parquet.output.committer.class will be overriden. 

For example, if spark.sql.parquet.output.committer.class is set to 
FileOutputCommitter, while spark.sql.sources.outputCommitterClass is set to 
DirectParquetOutputCommitter, neither _metadata nor _common_metadata will be 
written because FileOutputCommitter overrides DirectParquetOutputCommitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8122) ParquetRelation.enableLogForwarding() may fail to configure loggers


 [ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shaposhnikov updated SPARK-8122:
---
Description: 
_enableLogForwarding()_ doesn't hold to the created loggers that can be garbage 
collected and all configuration changes will be gone. From 
https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
javadocs:  _It is important to note that the Logger returned by one of the 
getLogger factory methods may be garbage collected at any time if a strong 
reference to the Logger is not kept._

All created logger references need to be kept, e.g. in static variables.


  was:

Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
created loggers that can be garbage collected and all configuration changes 
will be gone. From 
https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
javadocs:  _It is important to note that the Logger returned by one of the 
getLogger factory methods may be garbage collected at any time if a strong 
reference to the Logger is not kept._


 ParquetRelation.enableLogForwarding() may fail to configure loggers
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov
Priority: Minor

 _enableLogForwarding()_ doesn't hold to the created loggers that can be 
 garbage collected and all configuration changes will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._
 All created logger references need to be kept, e.g. in static variables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8118) Turn off noisy log output produced by Parquet 1.7.0


[ 
https://issues.apache.org/jira/browse/SPARK-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573981#comment-14573981
 ] 

Konstantin Shaposhnikov commented on SPARK-8118:


Name of the logger has been changed to _org.apache.parquet_. From parquet-mr 
Log:

{code}
// add a default handler in case there is none
Logger logger = Logger.getLogger(Log.class.getPackage().getName());
{code}


 Turn off noisy log output produced by Parquet 1.7.0
 ---

 Key: SPARK-8118
 URL: https://issues.apache.org/jira/browse/SPARK-8118
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.4.1, 1.5.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Minor

 Parquet 1.7.0 renames package name to org.apache.parquet, need to adjust 
 {{ParquetRelation.enableLogForwarding}} accordingly to avoid noisy log output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8122) A few problems in ParquetRelation.enableLogForwarding()


[ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573973#comment-14573973
 ] 

Konstantin Shaposhnikov commented on SPARK-8122:


I believe that currently `ParquetRelation.enableLogForwarding` doesn't do 
anything as it configures the wrong logger (parquet instead of 
org.apache.parquet). I haven't tested it though.


 A few problems in ParquetRelation.enableLogForwarding()
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov

 _enableLogForwarding()_ should be updated after parquet 1.7.0 update , 
 because name of the logger has been changed to `org.apache.parquet`. From 
 parquet-mr Log:
 {code}
 // add a default handler in case there is none
 Logger logger = Logger.getLogger(Log.class.getPackage().getName());
 {code}
 Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
 created loggers that can be garbage collected and all configuration changes 
 will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-8056) Design an easier way to construct schema for both Scala and Python

2015-06-04 Thread Ilya Ganelin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573874#comment-14573874
 ] 

Ilya Ganelin edited comment on SPARK-8056 at 6/5/15 12:35 AM:
--

[~rxin] Are you actively working on this? I think this could be readily solved 
by providing an interface to construct StructType the way we construct 
SparkConf, e.g.
new StructType().add(f1,v1).add(f1,v2) etc


was (Author: ilganeli):
[~rxin] Are you actively working on this? I think this could be readily solved 
by providing interface to construct StructType the way we construct SparkConf, 
e.g.
new StructType().add(f1,v1).add(f1,v2) etc

 Design an easier way to construct schema for both Scala and Python
 --

 Key: SPARK-8056
 URL: https://issues.apache.org/jira/browse/SPARK-8056
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin

 StructType is fairly hard to construct, especially in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8118) Turn off noisy log output produced by Parquet 1.7.0

2015-06-04 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-8118:
-

 Summary: Turn off noisy log output produced by Parquet 1.7.0
 Key: SPARK-8118
 URL: https://issues.apache.org/jira/browse/SPARK-8118
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1, 1.5.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Minor


Parquet 1.7.0 renames package name to org.apache.parquet, need to adjust 
{{ParquetRelation.enableLogForwarding}} accordingly to avoid noisy log output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8119) Spark will set total executor when some executors fail.

2015-06-04 Thread SaintBacchus (JIRA)

SaintBacchus created SPARK-8119:
---

 Summary: Spark will set total executor when some executors fail.
 Key: SPARK-8119
 URL: https://issues.apache.org/jira/browse/SPARK-8119
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.4.0
Reporter: SaintBacchus
 Fix For: 1.4.0


DynamicAllocation will set the total executor to a little number when it wants 
to kill some executors.
But in no-DynamicAllocation scenario, Spark will also set the total executor. 
So it will cause thus problem: sometimes an executor fails down, there is no 
more executor which will be pull up by spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8119) Spark will set total executor when some executors fail.


 [ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8119:
---

Assignee: (was: Apache Spark)

 Spark will set total executor when some executors fail.
 ---

 Key: SPARK-8119
 URL: https://issues.apache.org/jira/browse/SPARK-8119
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.4.0
Reporter: SaintBacchus
 Fix For: 1.4.0


 DynamicAllocation will set the total executor to a little number when it 
 wants to kill some executors.
 But in no-DynamicAllocation scenario, Spark will also set the total executor.
 So it will cause such problem: sometimes an executor fails down, there is no 
 more executor which will be pull up by spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8119) Spark will set total executor when some executors fail.


 [ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8119:
---

Assignee: Apache Spark

 Spark will set total executor when some executors fail.
 ---

 Key: SPARK-8119
 URL: https://issues.apache.org/jira/browse/SPARK-8119
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.4.0
Reporter: SaintBacchus
Assignee: Apache Spark
 Fix For: 1.4.0


 DynamicAllocation will set the total executor to a little number when it 
 wants to kill some executors.
 But in no-DynamicAllocation scenario, Spark will also set the total executor.
 So it will cause such problem: sometimes an executor fails down, there is no 
 more executor which will be pull up by spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8122) A few problems in ParquetRelation.enableLogForwarding()


[ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573966#comment-14573966
 ] 

Reynold Xin commented on SPARK-8122:


Thanks for filing. What's the relationship between this one and SPARK-8118?

 A few problems in ParquetRelation.enableLogForwarding()
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov

 _enableLogForwarding()_ should be updated after parquet 1.7.0 update , 
 because name of the logger has been changed to `org.apache.parquet`. From 
 parquet-mr Log:
 {code}
 // add a default handler in case there is none
 Logger logger = Logger.getLogger(Log.class.getPackage().getName());
 {code}
 Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
 created loggers that can be garbage collected and all configuration changes 
 will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8122) A few problems in ParquetRelation.enableLogForwarding()


 [ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-8122:
---
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-5463

 A few problems in ParquetRelation.enableLogForwarding()
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov

 _enableLogForwarding()_ should be updated after parquet 1.7.0 update , 
 because name of the logger has been changed to `org.apache.parquet`. From 
 parquet-mr Log:
 {code}
 // add a default handler in case there is none
 Logger logger = Logger.getLogger(Log.class.getPackage().getName());
 {code}
 Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
 created loggers that can be garbage collected and all configuration changes 
 will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8122) ParquetRelation.enableLogForwarding() may fail to configure loggers


 [ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shaposhnikov updated SPARK-8122:
---
Priority: Minor  (was: Major)

 ParquetRelation.enableLogForwarding() may fail to configure loggers
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov
Priority: Minor

 _enableLogForwarding()_ should be updated after parquet 1.7.0 update , 
 because name of the logger has been changed to `org.apache.parquet`. From 
 parquet-mr Log:
 {code}
 // add a default handler in case there is none
 Logger logger = Logger.getLogger(Log.class.getPackage().getName());
 {code}
 Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
 created loggers that can be garbage collected and all configuration changes 
 will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8122) ParquetRelation.enableLogForwarding() may fail to configure loggers


 [ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shaposhnikov updated SPARK-8122:
---
Summary: ParquetRelation.enableLogForwarding() may fail to configure 
loggers  (was: A few problems in ParquetRelation.enableLogForwarding())

 ParquetRelation.enableLogForwarding() may fail to configure loggers
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov

 _enableLogForwarding()_ should be updated after parquet 1.7.0 update , 
 because name of the logger has been changed to `org.apache.parquet`. From 
 parquet-mr Log:
 {code}
 // add a default handler in case there is none
 Logger logger = Logger.getLogger(Log.class.getPackage().getName());
 {code}
 Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
 created loggers that can be garbage collected and all configuration changes 
 will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8122) ParquetRelation.enableLogForwarding() may fail to configure loggers


 [ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shaposhnikov updated SPARK-8122:
---
Description: 

Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
created loggers that can be garbage collected and all configuration changes 
will be gone. From 
https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
javadocs:  _It is important to note that the Logger returned by one of the 
getLogger factory methods may be garbage collected at any time if a strong 
reference to the Logger is not kept._

  was:
_enableLogForwarding()_ should be updated after parquet 1.7.0 update , because 
name of the logger has been changed to `org.apache.parquet`. From parquet-mr 
Log:

{code}
// add a default handler in case there is none
Logger logger = Logger.getLogger(Log.class.getPackage().getName());
{code}

Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
created loggers that can be garbage collected and all configuration changes 
will be gone. From 
https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
javadocs:  _It is important to note that the Logger returned by one of the 
getLogger factory methods may be garbage collected at any time if a strong 
reference to the Logger is not kept._


 ParquetRelation.enableLogForwarding() may fail to configure loggers
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov
Priority: Minor

 Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
 created loggers that can be garbage collected and all configuration changes 
 will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8122) A few problems in ParquetRelation.enableLogForwarding()


[ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573978#comment-14573978
 ] 

Konstantin Shaposhnikov commented on SPARK-8122:


SPARK-8118 is for the first problem described in this issue.

The second problem (the loggers can be garbage collected) is another issue and 
should be fixed separately.

I will updated the JIRA.

 A few problems in ParquetRelation.enableLogForwarding()
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov

 _enableLogForwarding()_ should be updated after parquet 1.7.0 update , 
 because name of the logger has been changed to `org.apache.parquet`. From 
 parquet-mr Log:
 {code}
 // add a default handler in case there is none
 Logger logger = Logger.getLogger(Log.class.getPackage().getName());
 {code}
 Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
 created loggers that can be garbage collected and all configuration changes 
 will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-7536) Audit MLlib Python API for 1.4


[ 
https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573534#comment-14573534
 ] 

Yanbo Liang edited comment on SPARK-7536 at 6/4/15 8:42 PM:


[~josephkb] Sorry, I'm in business travel during 1st June to 10th June, so 
there will be no update during this period.


was (Author: yanboliang):
[~josephkb] I'm in business travel during 1st June to 10th June, so there will 
be no update during this period.

 Audit MLlib Python API for 1.4
 --

 Key: SPARK-7536
 URL: https://issues.apache.org/jira/browse/SPARK-7536
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Joseph K. Bradley
Assignee: Yanbo Liang

 For new public APIs added to MLlib, we need to check the generated HTML doc 
 and compare the Scala  Python versions.  We need to track:
 * Inconsistency: Do class/method/parameter names match? SPARK-7667
 * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
 be as complete as the Scala doc. SPARK-7666
 * API breaking changes: These should be very rare but are occasionally either 
 necessary (intentional) or accidental.  These must be recorded and added in 
 the Migration Guide for this release. SPARK-7665
 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
 component, please note that as well.
 * Missing classes/methods/parameters: We should create to-do JIRAs for 
 functionality missing from Python.
 ** classification
 *** StreamingLogisticRegressionWithSGD SPARK-7633
 ** clustering
 *** GaussianMixture SPARK-6258
 *** LDA SPARK-6259
 *** Power Iteration Clustering SPARK-5962
 *** StreamingKMeans SPARK-4118 
 ** evaluation
 *** MultilabelMetrics SPARK-6094 
 ** feature
 *** ElementwiseProduct SPARK-7605
 *** PCA SPARK-7604
 ** linalg
 *** Distributed linear algebra SPARK-6100
 ** pmml.export SPARK-7638
 ** regression
 *** StreamingLinearRegressionWithSGD SPARK-4127
 ** stat
 *** KernelDensity SPARK-7639
 ** util
 *** MLUtils SPARK-6263 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8114) Remove wildcard import on TestSQLContext._

Reynold Xin created SPARK-8114:
--

 Summary: Remove wildcard import on TestSQLContext._
 Key: SPARK-8114
 URL: https://issues.apache.org/jira/browse/SPARK-8114
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin


We import TestSQLContext._ in almost all test suites. This import introduces a 
lot of methods and should be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8116) sc.range() doesn't match python range()

2015-06-04 Thread Ted Blackman (JIRA)

Ted Blackman created SPARK-8116:
---

 Summary: sc.range() doesn't match python range()
 Key: SPARK-8116
 URL: https://issues.apache.org/jira/browse/SPARK-8116
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.4.0, 1.4.1
Reporter: Ted Blackman
Priority: Minor


Python's built-in range() and xrange() functions can take 1, 2, or 3 arguments. 
Ranges with just 1 argument are probably used the most frequently, e.g.:
for i in range(len(myList)): ...

However, in pyspark, the SparkContext range() method throws an error when 
called with a single argument, due to the way its arguments get passed into 
python's range function.

There's no good reason that I can think of not to support the same syntax as 
the built-in function. To fix this, we can set the default of the sc.range() 
method's `stop` argument to None, and then inside the method, if it is None, 
replace `stop` with `start` and set `start` to 0, which is what the c 
implementation of range() does:
https://github.com/python/cpython/blob/master/Objects/rangeobject.c#L87



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8116) sc.range() doesn't match python range()


 [ 
https://issues.apache.org/jira/browse/SPARK-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8116:
---

Assignee: Apache Spark

 sc.range() doesn't match python range()
 ---

 Key: SPARK-8116
 URL: https://issues.apache.org/jira/browse/SPARK-8116
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.4.0, 1.4.1
Reporter: Ted Blackman
Assignee: Apache Spark
Priority: Minor
  Labels: easyfix

 Python's built-in range() and xrange() functions can take 1, 2, or 3 
 arguments. Ranges with just 1 argument are probably used the most frequently, 
 e.g.:
 for i in range(len(myList)): ...
 However, in pyspark, the SparkContext range() method throws an error when 
 called with a single argument, due to the way its arguments get passed into 
 python's range function.
 There's no good reason that I can think of not to support the same syntax as 
 the built-in function. To fix this, we can set the default of the sc.range() 
 method's `stop` argument to None, and then inside the method, if it is None, 
 replace `stop` with `start` and set `start` to 0, which is what the c 
 implementation of range() does:
 https://github.com/python/cpython/blob/master/Objects/rangeobject.c#L87



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8116) sc.range() doesn't match python range()


[ 
https://issues.apache.org/jira/browse/SPARK-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573752#comment-14573752
 ] 

Apache Spark commented on SPARK-8116:
-

User 'belisarius222' has created a pull request for this issue:
https://github.com/apache/spark/pull/6656

 sc.range() doesn't match python range()
 ---

 Key: SPARK-8116
 URL: https://issues.apache.org/jira/browse/SPARK-8116
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.4.0, 1.4.1
Reporter: Ted Blackman
Priority: Minor
  Labels: easyfix

 Python's built-in range() and xrange() functions can take 1, 2, or 3 
 arguments. Ranges with just 1 argument are probably used the most frequently, 
 e.g.:
 for i in range(len(myList)): ...
 However, in pyspark, the SparkContext range() method throws an error when 
 called with a single argument, due to the way its arguments get passed into 
 python's range function.
 There's no good reason that I can think of not to support the same syntax as 
 the built-in function. To fix this, we can set the default of the sc.range() 
 method's `stop` argument to None, and then inside the method, if it is None, 
 replace `stop` with `start` and set `start` to 0, which is what the c 
 implementation of range() does:
 https://github.com/python/cpython/blob/master/Objects/rangeobject.c#L87



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8116) sc.range() doesn't match python range()


 [ 
https://issues.apache.org/jira/browse/SPARK-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8116:
---

Assignee: (was: Apache Spark)

 sc.range() doesn't match python range()
 ---

 Key: SPARK-8116
 URL: https://issues.apache.org/jira/browse/SPARK-8116
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.4.0, 1.4.1
Reporter: Ted Blackman
Priority: Minor
  Labels: easyfix

 Python's built-in range() and xrange() functions can take 1, 2, or 3 
 arguments. Ranges with just 1 argument are probably used the most frequently, 
 e.g.:
 for i in range(len(myList)): ...
 However, in pyspark, the SparkContext range() method throws an error when 
 called with a single argument, due to the way its arguments get passed into 
 python's range function.
 There's no good reason that I can think of not to support the same syntax as 
 the built-in function. To fix this, we can set the default of the sc.range() 
 method's `stop` argument to None, and then inside the method, if it is None, 
 replace `stop` with `start` and set `start` to 0, which is what the c 
 implementation of range() does:
 https://github.com/python/cpython/blob/master/Objects/rangeobject.c#L87



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8095) Spark package dependencies not resolved when package is in local-ivy-cache


[ 
https://issues.apache.org/jira/browse/SPARK-8095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573784#comment-14573784
 ] 

Apache Spark commented on SPARK-8095:
-

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/6658

 Spark package dependencies not resolved when package is in local-ivy-cache
 --

 Key: SPARK-8095
 URL: https://issues.apache.org/jira/browse/SPARK-8095
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.4.0
Reporter: Eron Wright 

 Given a dependency expressed with '--packages', the transitive dependencies 
 are supposed to be automatically included. This is true for most repository 
 types including local-m2-cache, Spark Packages, and central.   For 
 ivy-local-cache, it is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8095) Spark package dependencies not resolved when package is in local-ivy-cache


 [ 
https://issues.apache.org/jira/browse/SPARK-8095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8095:
---

Assignee: (was: Apache Spark)

 Spark package dependencies not resolved when package is in local-ivy-cache
 --

 Key: SPARK-8095
 URL: https://issues.apache.org/jira/browse/SPARK-8095
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.4.0
Reporter: Eron Wright 

 Given a dependency expressed with '--packages', the transitive dependencies 
 are supposed to be automatically included. This is true for most repository 
 types including local-m2-cache, Spark Packages, and central.   For 
 ivy-local-cache, it is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8071) Run PySpark dataframe.rollup/cube test failed


 [ 
https://issues.apache.org/jira/browse/SPARK-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-8071:
---
Description: 
I run test for Spark, and failed on PySpark, details are:
{code}
File /xxx/Spark/python/pyspark/sql/dataframe.py, line 837, in 
pyspark.sql.dataframe.DataFrame.cube

Failed example:
* df.cube('name', df.age).count().show()

Exception raised:
* Traceback (most recent call last):
** File /usr/lib64/python2.6/doctest.py, line 1253, in __run
*** compileflags, 1) in test.globs
** File doctest pyspark.sql.dataframe.DataFrame.cube\[0], line 1, in 
module
*** df.cube('name', df.age).count().show()
** File /xxx/Spark/python/pyspark/sql/dataframe.py, line 291, in show
*** print(self._jdf.showString\(n))
** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 
538, in \_\_call\_\_
*** self.target_id, self.name)
** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 
300, in get_return_value
*** format(target_id, '.', name), value)
* Py4JJavaError: An error occurred while calling o212.showString.
* : java.lang.AssertionError: assertion failed: No plan for Cube 
[name#1,age#0], [name#1,age#0,COUNT(1) AS count#27L], grouping__id#28
** LogicalRDD [age#0,name#1], MapPartitionsRDD\[7] at applySchemaToPythonRDD at 
NativeMethodAccessorImpl.java:-2

*** at scala.Predef$.assert(Predef.scala:179)
*** at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
*** at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
*** at 
org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:312)
*** at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
*** at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
*** at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
*** at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
*** at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:913)
*** at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:911)
*** at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:917)
*** at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:917)
*** at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1255)
*** at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1189)
*** at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1248)
*** at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:176)
*** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
*** at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
*** at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
*** at java.lang.reflect.Method.invoke(Method.java:606)
*** at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
*** at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
*** at py4j.Gateway.invoke(Gateway.java:259)
*** at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
*** at py4j.commands.CallCommand.execute(CallCommand.java:79)
*** at py4j.GatewayConnection.run(GatewayConnection.java:207)
*** at java.lang.Thread.run(Thread.java:745)

**
   1 of   1 in pyspark.sql.dataframe.DataFrame.cube
   1 of   1 in pyspark.sql.dataframe.DataFrame.rollup
***Test Failed*** 2 failures.
{code}

cc [~davies]

  was:
I run test for Spark, and failed on PySpark, details are:

File /xxx/Spark/python/pyspark/sql/dataframe.py, line 837, in 
pyspark.sql.dataframe.DataFrame.cube

Failed example:
* df.cube('name', df.age).count().show()

Exception raised:
* Traceback (most recent call last):
** File /usr/lib64/python2.6/doctest.py, line 1253, in __run
*** compileflags, 1) in test.globs
** File doctest pyspark.sql.dataframe.DataFrame.cube\[0], line 1, in 
module
*** df.cube('name', df.age).count().show()
** File /xxx/Spark/python/pyspark/sql/dataframe.py, line 291, in show
*** print(self._jdf.showString\(n))
** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 
538, in \_\_call\_\_
*** self.target_id, self.name)
** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 
300, in get_return_value
*** format(target_id, '.', name), value)
* Py4JJavaError: An error occurred while calling o212.showString.
* : java.lang.AssertionError: assertion failed: No plan for Cube 
[name#1,age#0], [name#1,age#0,COUNT(1) AS count#27L], grouping__id#28
** LogicalRDD [age#0,name#1], MapPartitionsRDD\[7] at applySchemaToPythonRDD at 
NativeMethodAccessorImpl.java:-2

*** at scala.Predef$.assert(Predef.scala:179)
*** at

[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)

2015-06-04 Thread DB Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573820#comment-14573820
 ] 

DB Tsai commented on SPARK-7008:


Do you see better convergence rate when LBFGS is used?

 An implementation of Factorization Machine (LibFM)
 --

 Key: SPARK-7008
 URL: https://issues.apache.org/jira/browse/SPARK-7008
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.3.0, 1.3.1, 1.3.2
Reporter: zhengruifeng
  Labels: features, patch
 Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, 
 QQ20150421-2.png


 An implementation of Factorization Machines based on Scala and Spark MLlib.
 FM is a kind of machine learning algorithm for multi-linear regression, and 
 is widely used for recommendation.
 FM works well in recent years' recommendation competitions.
 Ref:
 http://libfm.org/
 http://doi.acm.org/10.1145/2168752.2168771
 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7440) Remove physical Distinct operator in favor of Aggregate


 [ 
https://issues.apache.org/jira/browse/SPARK-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin reassigned SPARK-7440:
--

Assignee: Reynold Xin

 Remove physical Distinct operator in favor of Aggregate
 ---

 Key: SPARK-7440
 URL: https://issues.apache.org/jira/browse/SPARK-7440
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
 Fix For: 1.5.0


 We can just rewrite distinct using groupby (i.e. aggregate operator).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7440) Remove physical Distinct operator in favor of Aggregate


 [ 
https://issues.apache.org/jira/browse/SPARK-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-7440.

   Resolution: Fixed
Fix Version/s: 1.5.0

 Remove physical Distinct operator in favor of Aggregate
 ---

 Key: SPARK-7440
 URL: https://issues.apache.org/jira/browse/SPARK-7440
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Reynold Xin
 Fix For: 1.5.0


 We can just rewrite distinct using groupby (i.e. aggregate operator).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8111) SparkR shell should display Spark logo and version banner on startup

2015-06-04 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-8111:
-
Labels: Starter  (was: )

 SparkR shell should display Spark logo and version banner on startup
 

 Key: SPARK-8111
 URL: https://issues.apache.org/jira/browse/SPARK-8111
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Matei Zaharia
Priority: Trivial
  Labels: Starter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8112) Received block event count through the StreamingListener can be negative


[ 
https://issues.apache.org/jira/browse/SPARK-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573573#comment-14573573
 ] 

Tathagata Das commented on SPARK-8112:
--

Take a look at SPARK-8080

 Received block event count through the StreamingListener can be negative
 

 Key: SPARK-8112
 URL: https://issues.apache.org/jira/browse/SPARK-8112
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Tathagata Das
Assignee: Shixiong Zhu
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8111) SparkR shell should display Spark logo and version banner on startup

2015-06-04 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573574#comment-14573574
 ] 

Shivaram Venkataraman commented on SPARK-8111:
--

The code will need to go in 
https://github.com/apache/spark/blob/2bcdf8c239d2ba79f64fb8878da83d4c2ec28b30/R/pkg/inst/profile/shell.R#L31

 SparkR shell should display Spark logo and version banner on startup
 

 Key: SPARK-8111
 URL: https://issues.apache.org/jira/browse/SPARK-8111
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Matei Zaharia
Priority: Trivial
  Labels: Starter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7991) Python DataFrame: support passing a list into describe


 [ 
https://issues.apache.org/jira/browse/SPARK-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7991:
---

Assignee: Apache Spark

 Python DataFrame: support passing a list into describe
 --

 Key: SPARK-7991
 URL: https://issues.apache.org/jira/browse/SPARK-7991
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Apache Spark
  Labels: starter

 DataFrame.describe in Python takes a vararg, i.e. it can be invoked this way:
 {code}
 df.describe('col1', 'col2', 'col3')
 {code}
 Most of our DataFrame functions accept a list in addition to varargs. 
 describe should do the same, i.e. it should also accept a Python list:
 {code}
 df.describe(['col1', 'col2', 'col3'])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7991) Python DataFrame: support passing a list into describe


 [ 
https://issues.apache.org/jira/browse/SPARK-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7991:
---

Assignee: (was: Apache Spark)

 Python DataFrame: support passing a list into describe
 --

 Key: SPARK-7991
 URL: https://issues.apache.org/jira/browse/SPARK-7991
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
  Labels: starter

 DataFrame.describe in Python takes a vararg, i.e. it can be invoked this way:
 {code}
 df.describe('col1', 'col2', 'col3')
 {code}
 Most of our DataFrame functions accept a list in addition to varargs. 
 describe should do the same, i.e. it should also accept a Python list:
 {code}
 df.describe(['col1', 'col2', 'col3'])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7991) Python DataFrame: support passing a list into describe


[ 
https://issues.apache.org/jira/browse/SPARK-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573678#comment-14573678
 ] 

Apache Spark commented on SPARK-7991:
-

User 'ameyc' has created a pull request for this issue:
https://github.com/apache/spark/pull/6655

 Python DataFrame: support passing a list into describe
 --

 Key: SPARK-7991
 URL: https://issues.apache.org/jira/browse/SPARK-7991
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
  Labels: starter

 DataFrame.describe in Python takes a vararg, i.e. it can be invoked this way:
 {code}
 df.describe('col1', 'col2', 'col3')
 {code}
 Most of our DataFrame functions accept a list in addition to varargs. 
 describe should do the same, i.e. it should also accept a Python list:
 {code}
 df.describe(['col1', 'col2', 'col3'])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8109) TestSQLContext's static initialization is run during MiMa tests, causing SparkContexts to be created


 [ 
https://issues.apache.org/jira/browse/SPARK-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-8109:
---
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-8113

 TestSQLContext's static initialization is run during MiMa tests, causing 
 SparkContexts to be created
 

 Key: SPARK-8109
 URL: https://issues.apache.org/jira/browse/SPARK-8109
 Project: Spark
  Issue Type: Sub-task
  Components: SQL, Tests
Reporter: Josh Rosen

 Check out this stacktrace which occurred during MiMa tests in the pull 
 request builder:
 {code}
 java.net.BindException: Address already in use
   at sun.nio.ch.Net.bind0(Native Method)
   at sun.nio.ch.Net.bind(Net.java:444)
   at sun.nio.ch.Net.bind(Net.java:436)
   at 
 sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
   at 
 org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
   at 
 org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
   at 
 org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
   at 
 org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
   at org.eclipse.jetty.server.Server.doStart(Server.java:293)
   at 
 org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
   at 
 org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:228)
   at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:238)
   at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:238)
   at 
 org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
   at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)
   at 
 org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:238)
   at org.apache.spark.ui.WebUI.bind(WebUI.scala:117)
   at 
 org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:448)
   at 
 org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:448)
   at scala.Option.foreach(Option.scala:236)
   at org.apache.spark.SparkContext.init(SparkContext.scala:448)
   at org.apache.spark.SparkContext.init(SparkContext.scala:135)
   at 
 org.apache.spark.sql.test.LocalSQLContext.init(TestSQLContext.scala:29)
   at 
 org.apache.spark.sql.test.TestSQLContext$.init(TestSQLContext.scala:55)
   at 
 org.apache.spark.sql.test.TestSQLContext$.clinit(TestSQLContext.scala)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:274)
   at 
 scala.reflect.runtime.JavaMirrors$JavaMirror.javaClass(JavaMirrors.scala:500)
   at 
 scala.reflect.runtime.JavaMirrors$JavaMirror.tryJavaClass(JavaMirrors.scala:505)
   at 
 scala.reflect.runtime.SymbolLoaders$PackageScope.lookupEntry(SymbolLoaders.scala:109)
   at scala.reflect.internal.Types$Type.findMember(Types.scala:1185)
   at scala.reflect.internal.Types$Type.memberBasedOnName(Types.scala:722)
   at scala.reflect.internal.Types$Type.member(Types.scala:680)
   at 
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:43)
   at 
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:161)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:21)
   at 
 org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:72)
   at 
 org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:69)
   at 
 scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
   at 
 scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
   at 
 scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
   at 
 scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
   at 
 org.apache.spark.tools.GenerateMIMAIgnore$.privateWithin(GenerateMIMAIgnore.scala:69)
   at 
 org.apache.spark.tools.GenerateMIMAIgnore$.main(GenerateMIMAIgnore.scala:126)
   at 
 org.apache.spark.tools.GenerateMIMAIgnore.main(GenerateMIMAIgnore.scala)
 {code}
 Here, TestSQLContext's static initialization code is being run during MiMa 
 checks and that initialization creates a SparkContext.  Because

[jira] [Commented] (SPARK-8071) Run PySpark dataframe.rollup/cube test failed


[ 
https://issues.apache.org/jira/browse/SPARK-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573771#comment-14573771
 ] 

Davies Liu commented on SPARK-8071:
---

It failed in Scala side, cc [~rxin]

 Run PySpark dataframe.rollup/cube test failed
 -

 Key: SPARK-8071
 URL: https://issues.apache.org/jira/browse/SPARK-8071
 Project: Spark
  Issue Type: Bug
  Components: PySpark
 Environment: OS: SUSE 11 SP3; JDK: 1.8.0_40; Python: 2.6.8; Hadoop: 
 2.7.0; Spark: master branch
Reporter: Weizhong
Priority: Minor

 I run test for Spark, and failed on PySpark, details are:
 File /xxx/Spark/python/pyspark/sql/dataframe.py, line 837, in 
 pyspark.sql.dataframe.DataFrame.cube
 Failed example:
 * df.cube('name', df.age).count().show()
 Exception raised:
 * Traceback (most recent call last):
 ** File /usr/lib64/python2.6/doctest.py, line 1253, in __run
 *** compileflags, 1) in test.globs
 ** File doctest pyspark.sql.dataframe.DataFrame.cube\[0], line 1, in 
 module
 *** df.cube('name', df.age).count().show()
 ** File /xxx/Spark/python/pyspark/sql/dataframe.py, line 291, in show
 *** print(self._jdf.showString\(n))
 ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, 
 line 538, in \_\_call\_\_
 *** self.target_id, self.name)
 ** File /xxx/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 
 300, in get_return_value
 *** format(target_id, '.', name), value)
 * Py4JJavaError: An error occurred while calling o212.showString.
 * : java.lang.AssertionError: assertion failed: No plan for Cube 
 [name#1,age#0], [name#1,age#0,COUNT(1) AS count#27L], grouping__id#28
 ** LogicalRDD [age#0,name#1], MapPartitionsRDD\[7] at applySchemaToPythonRDD 
 at NativeMethodAccessorImpl.java:-2
 *** at scala.Predef$.assert(Predef.scala:179)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
 *** at 
 org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:312)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 *** at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 *** at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:913)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:911)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:917)
 *** at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:917)
 *** at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1255)
 *** at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1189)
 *** at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1248)
 *** at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:176)
 *** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 *** at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 *** at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 *** at java.lang.reflect.Method.invoke(Method.java:606)
 *** at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
 *** at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
 *** at py4j.Gateway.invoke(Gateway.java:259)
 *** at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
 *** at py4j.commands.CallCommand.execute(CallCommand.java:79)
 *** at py4j.GatewayConnection.run(GatewayConnection.java:207)
 *** at java.lang.Thread.run(Thread.java:745)
 **
1 of   1 in pyspark.sql.dataframe.DataFrame.cube
1 of   1 in pyspark.sql.dataframe.DataFrame.rollup
 ***Test Failed*** 2 failures.
 cc [~davies]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8113) SQL module test cleanup


[ 
https://issues.apache.org/jira/browse/SPARK-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573822#comment-14573822
 ] 

Apache Spark commented on SPARK-8113:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/6661

 SQL module test cleanup
 ---

 Key: SPARK-8113
 URL: https://issues.apache.org/jira/browse/SPARK-8113
 Project: Spark
  Issue Type: Umbrella
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Minor

 Some cleanup tasks to track here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8113) SQL module test cleanup


 [ 
https://issues.apache.org/jira/browse/SPARK-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8113:
---

Assignee: Reynold Xin  (was: Apache Spark)

 SQL module test cleanup
 ---

 Key: SPARK-8113
 URL: https://issues.apache.org/jira/browse/SPARK-8113
 Project: Spark
  Issue Type: Umbrella
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Minor

 Some cleanup tasks to track here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8113) SQL module test cleanup


 [ 
https://issues.apache.org/jira/browse/SPARK-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8113:
---

Assignee: Apache Spark  (was: Reynold Xin)

 SQL module test cleanup
 ---

 Key: SPARK-8113
 URL: https://issues.apache.org/jira/browse/SPARK-8113
 Project: Spark
  Issue Type: Umbrella
  Components: SQL
Reporter: Reynold Xin
Assignee: Apache Spark
Priority: Minor

 Some cleanup tasks to track here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7536) Audit MLlib Python API for 1.4


[ 
https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573536#comment-14573536
 ] 

Yanbo Liang commented on SPARK-7536:


[~josephkb] I'm in business travel during 1st June to 10th June, so there will 
be no update during this period.

 Audit MLlib Python API for 1.4
 --

 Key: SPARK-7536
 URL: https://issues.apache.org/jira/browse/SPARK-7536
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Joseph K. Bradley
Assignee: Yanbo Liang

 For new public APIs added to MLlib, we need to check the generated HTML doc 
 and compare the Scala  Python versions.  We need to track:
 * Inconsistency: Do class/method/parameter names match? SPARK-7667
 * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
 be as complete as the Scala doc. SPARK-7666
 * API breaking changes: These should be very rare but are occasionally either 
 necessary (intentional) or accidental.  These must be recorded and added in 
 the Migration Guide for this release. SPARK-7665
 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
 component, please note that as well.
 * Missing classes/methods/parameters: We should create to-do JIRAs for 
 functionality missing from Python.
 ** classification
 *** StreamingLogisticRegressionWithSGD SPARK-7633
 ** clustering
 *** GaussianMixture SPARK-6258
 *** LDA SPARK-6259
 *** Power Iteration Clustering SPARK-5962
 *** StreamingKMeans SPARK-4118 
 ** evaluation
 *** MultilabelMetrics SPARK-6094 
 ** feature
 *** ElementwiseProduct SPARK-7605
 *** PCA SPARK-7604
 ** linalg
 *** Distributed linear algebra SPARK-6100
 ** pmml.export SPARK-7638
 ** regression
 *** StreamingLinearRegressionWithSGD SPARK-4127
 ** stat
 *** KernelDensity SPARK-7639
 ** util
 *** MLUtils SPARK-6263 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7536) Audit MLlib Python API for 1.4


[ 
https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573534#comment-14573534
 ] 

Yanbo Liang commented on SPARK-7536:


[~josephkb] I'm in business travel during 1st June to 10th June, so there will 
be no update during this period.

 Audit MLlib Python API for 1.4
 --

 Key: SPARK-7536
 URL: https://issues.apache.org/jira/browse/SPARK-7536
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Joseph K. Bradley
Assignee: Yanbo Liang

 For new public APIs added to MLlib, we need to check the generated HTML doc 
 and compare the Scala  Python versions.  We need to track:
 * Inconsistency: Do class/method/parameter names match? SPARK-7667
 * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
 be as complete as the Scala doc. SPARK-7666
 * API breaking changes: These should be very rare but are occasionally either 
 necessary (intentional) or accidental.  These must be recorded and added in 
 the Migration Guide for this release. SPARK-7665
 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
 component, please note that as well.
 * Missing classes/methods/parameters: We should create to-do JIRAs for 
 functionality missing from Python.
 ** classification
 *** StreamingLogisticRegressionWithSGD SPARK-7633
 ** clustering
 *** GaussianMixture SPARK-6258
 *** LDA SPARK-6259
 *** Power Iteration Clustering SPARK-5962
 *** StreamingKMeans SPARK-4118 
 ** evaluation
 *** MultilabelMetrics SPARK-6094 
 ** feature
 *** ElementwiseProduct SPARK-7605
 *** PCA SPARK-7604
 ** linalg
 *** Distributed linear algebra SPARK-6100
 ** pmml.export SPARK-7638
 ** regression
 *** StreamingLinearRegressionWithSGD SPARK-4127
 ** stat
 *** KernelDensity SPARK-7639
 ** util
 *** MLUtils SPARK-6263 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-7536) Audit MLlib Python API for 1.4


 [ 
https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-7536:
---
Comment: was deleted

(was: [~josephkb] I'm in business travel during 1st June to 10th June, so there 
will be no update during this period.)

 Audit MLlib Python API for 1.4
 --

 Key: SPARK-7536
 URL: https://issues.apache.org/jira/browse/SPARK-7536
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Joseph K. Bradley
Assignee: Yanbo Liang

 For new public APIs added to MLlib, we need to check the generated HTML doc 
 and compare the Scala  Python versions.  We need to track:
 * Inconsistency: Do class/method/parameter names match? SPARK-7667
 * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
 be as complete as the Scala doc. SPARK-7666
 * API breaking changes: These should be very rare but are occasionally either 
 necessary (intentional) or accidental.  These must be recorded and added in 
 the Migration Guide for this release. SPARK-7665
 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
 component, please note that as well.
 * Missing classes/methods/parameters: We should create to-do JIRAs for 
 functionality missing from Python.
 ** classification
 *** StreamingLogisticRegressionWithSGD SPARK-7633
 ** clustering
 *** GaussianMixture SPARK-6258
 *** LDA SPARK-6259
 *** Power Iteration Clustering SPARK-5962
 *** StreamingKMeans SPARK-4118 
 ** evaluation
 *** MultilabelMetrics SPARK-6094 
 ** feature
 *** ElementwiseProduct SPARK-7605
 *** PCA SPARK-7604
 ** linalg
 *** Distributed linear algebra SPARK-6100
 ** pmml.export SPARK-7638
 ** regression
 *** StreamingLinearRegressionWithSGD SPARK-4127
 ** stat
 *** KernelDensity SPARK-7639
 ** util
 *** MLUtils SPARK-6263 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-7536) Audit MLlib Python API for 1.4


 [ 
https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-7536:
---
Comment: was deleted

(was: [~josephkb] I'm in business travel during 1st June to 10th June, so there 
will be no update during this period.)

 Audit MLlib Python API for 1.4
 --

 Key: SPARK-7536
 URL: https://issues.apache.org/jira/browse/SPARK-7536
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Joseph K. Bradley
Assignee: Yanbo Liang

 For new public APIs added to MLlib, we need to check the generated HTML doc 
 and compare the Scala  Python versions.  We need to track:
 * Inconsistency: Do class/method/parameter names match? SPARK-7667
 * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
 be as complete as the Scala doc. SPARK-7666
 * API breaking changes: These should be very rare but are occasionally either 
 necessary (intentional) or accidental.  These must be recorded and added in 
 the Migration Guide for this release. SPARK-7665
 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
 component, please note that as well.
 * Missing classes/methods/parameters: We should create to-do JIRAs for 
 functionality missing from Python.
 ** classification
 *** StreamingLogisticRegressionWithSGD SPARK-7633
 ** clustering
 *** GaussianMixture SPARK-6258
 *** LDA SPARK-6259
 *** Power Iteration Clustering SPARK-5962
 *** StreamingKMeans SPARK-4118 
 ** evaluation
 *** MultilabelMetrics SPARK-6094 
 ** feature
 *** ElementwiseProduct SPARK-7605
 *** PCA SPARK-7604
 ** linalg
 *** Distributed linear algebra SPARK-6100
 ** pmml.export SPARK-7638
 ** regression
 *** StreamingLinearRegressionWithSGD SPARK-4127
 ** stat
 *** KernelDensity SPARK-7639
 ** util
 *** MLUtils SPARK-6263 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-7536) Audit MLlib Python API for 1.4


 [ 
https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-7536:
---
Comment: was deleted

(was: [~josephkb] I'm in business travel during 1st June to 10th June, so there 
will be no update during this period.)

 Audit MLlib Python API for 1.4
 --

 Key: SPARK-7536
 URL: https://issues.apache.org/jira/browse/SPARK-7536
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Joseph K. Bradley
Assignee: Yanbo Liang

 For new public APIs added to MLlib, we need to check the generated HTML doc 
 and compare the Scala  Python versions.  We need to track:
 * Inconsistency: Do class/method/parameter names match? SPARK-7667
 * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
 be as complete as the Scala doc. SPARK-7666
 * API breaking changes: These should be very rare but are occasionally either 
 necessary (intentional) or accidental.  These must be recorded and added in 
 the Migration Guide for this release. SPARK-7665
 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
 component, please note that as well.
 * Missing classes/methods/parameters: We should create to-do JIRAs for 
 functionality missing from Python.
 ** classification
 *** StreamingLogisticRegressionWithSGD SPARK-7633
 ** clustering
 *** GaussianMixture SPARK-6258
 *** LDA SPARK-6259
 *** Power Iteration Clustering SPARK-5962
 *** StreamingKMeans SPARK-4118 
 ** evaluation
 *** MultilabelMetrics SPARK-6094 
 ** feature
 *** ElementwiseProduct SPARK-7605
 *** PCA SPARK-7604
 ** linalg
 *** Distributed linear algebra SPARK-6100
 ** pmml.export SPARK-7638
 ** regression
 *** StreamingLinearRegressionWithSGD SPARK-4127
 ** stat
 *** KernelDensity SPARK-7639
 ** util
 *** MLUtils SPARK-6263 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7417) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies


 [ 
https://issues.apache.org/jira/browse/SPARK-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-7417:
-
Assignee: Burak Yavuz

 Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies
 ---

 Key: SPARK-7417
 URL: https://issues.apache.org/jira/browse/SPARK-7417
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Andrew Or
Assignee: Burak Yavuz
Priority: Critical
  Labels: flaky-test
 Fix For: 1.3.2, 1.4.0


 {code}
 Expected exception java.lang.RuntimeException to be thrown, but no exception 
 was thrown.
 {code}
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2201/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/neglects_Spark_and_Spark_s_dependencies/
 ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-7417) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies


 [ 
https://issues.apache.org/jira/browse/SPARK-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-7417.

Resolution: Fixed

 Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies
 ---

 Key: SPARK-7417
 URL: https://issues.apache.org/jira/browse/SPARK-7417
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Andrew Or
Assignee: Burak Yavuz
Priority: Critical
  Labels: flaky-test
 Fix For: 1.3.2, 1.4.0


 {code}
 Expected exception java.lang.RuntimeException to be thrown, but no exception 
 was thrown.
 {code}
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2201/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/neglects_Spark_and_Spark_s_dependencies/
 ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8095) Spark package dependencies not resolved when package is in local-ivy-cache


 [ 
https://issues.apache.org/jira/browse/SPARK-8095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8095:
---

Assignee: Apache Spark

 Spark package dependencies not resolved when package is in local-ivy-cache
 --

 Key: SPARK-8095
 URL: https://issues.apache.org/jira/browse/SPARK-8095
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.4.0
Reporter: Eron Wright 
Assignee: Apache Spark

 Given a dependency expressed with '--packages', the transitive dependencies 
 are supposed to be automatically included. This is true for most repository 
 types including local-m2-cache, Spark Packages, and central.   For 
 ivy-local-cache, it is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8117) Push codegen into Expression


 [ 
https://issues.apache.org/jira/browse/SPARK-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8117:
---

Assignee: Davies Liu  (was: Apache Spark)

 Push codegen into Expression
 

 Key: SPARK-8117
 URL: https://issues.apache.org/jira/browse/SPARK-8117
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Davies Liu

 Push the codegen implementation of expression into Expression itself, make it 
 easy to manage and extend.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8109) TestSQLContext's static initialization is run during MiMa tests, causing SparkContexts to be created

2015-06-04 Thread Josh Rosen (JIRA)

Josh Rosen created SPARK-8109:
-

 Summary: TestSQLContext's static initialization is run during MiMa 
tests, causing SparkContexts to be created
 Key: SPARK-8109
 URL: https://issues.apache.org/jira/browse/SPARK-8109
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Tests
Reporter: Josh Rosen


Check out this stacktrace which occurred during MiMa tests in the pull request 
builder:

{code}
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at 
org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at 
org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at 
org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.eclipse.jetty.server.Server.doStart(Server.java:293)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at 
org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:228)
at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:238)
at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:238)
at 
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)
at 
org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:238)
at org.apache.spark.ui.WebUI.bind(WebUI.scala:117)
at 
org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:448)
at 
org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:448)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.SparkContext.init(SparkContext.scala:448)
at org.apache.spark.SparkContext.init(SparkContext.scala:135)
at 
org.apache.spark.sql.test.LocalSQLContext.init(TestSQLContext.scala:29)
at 
org.apache.spark.sql.test.TestSQLContext$.init(TestSQLContext.scala:55)
at 
org.apache.spark.sql.test.TestSQLContext$.clinit(TestSQLContext.scala)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.javaClass(JavaMirrors.scala:500)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.tryJavaClass(JavaMirrors.scala:505)
at 
scala.reflect.runtime.SymbolLoaders$PackageScope.lookupEntry(SymbolLoaders.scala:109)
at scala.reflect.internal.Types$Type.findMember(Types.scala:1185)
at scala.reflect.internal.Types$Type.memberBasedOnName(Types.scala:722)
at scala.reflect.internal.Types$Type.member(Types.scala:680)
at 
scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:43)
at 
scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
at 
scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
at 
scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:161)
at 
scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:21)
at 
org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:72)
at 
org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:69)
at 
scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
at 
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
at 
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
at 
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
at 
org.apache.spark.tools.GenerateMIMAIgnore$.privateWithin(GenerateMIMAIgnore.scala:69)
at 
org.apache.spark.tools.GenerateMIMAIgnore$.main(GenerateMIMAIgnore.scala:126)
at 
org.apache.spark.tools.GenerateMIMAIgnore.main(GenerateMIMAIgnore.scala)
{code}

Here, TestSQLContext's static initialization code is being run during MiMa 
checks and that initialization creates a SparkContext.  Because MiMa doesn't 
run with our test system properties, the UI tries to bind to a contended port.  
This may lead to flakiness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To

[jira] [Commented] (SPARK-8109) TestSQLContext's static initialization is run during MiMa tests, causing SparkContexts to be created

2015-06-04 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573554#comment-14573554
 ] 

Josh Rosen commented on SPARK-8109:
---

/cc [~rxin]

 TestSQLContext's static initialization is run during MiMa tests, causing 
 SparkContexts to be created
 

 Key: SPARK-8109
 URL: https://issues.apache.org/jira/browse/SPARK-8109
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Tests
Reporter: Josh Rosen

 Check out this stacktrace which occurred during MiMa tests in the pull 
 request builder:
 {code}
 java.net.BindException: Address already in use
   at sun.nio.ch.Net.bind0(Native Method)
   at sun.nio.ch.Net.bind(Net.java:444)
   at sun.nio.ch.Net.bind(Net.java:436)
   at 
 sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
   at 
 org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
   at 
 org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
   at 
 org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
   at 
 org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
   at org.eclipse.jetty.server.Server.doStart(Server.java:293)
   at 
 org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
   at 
 org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:228)
   at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:238)
   at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:238)
   at 
 org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
   at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)
   at 
 org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:238)
   at org.apache.spark.ui.WebUI.bind(WebUI.scala:117)
   at 
 org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:448)
   at 
 org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:448)
   at scala.Option.foreach(Option.scala:236)
   at org.apache.spark.SparkContext.init(SparkContext.scala:448)
   at org.apache.spark.SparkContext.init(SparkContext.scala:135)
   at 
 org.apache.spark.sql.test.LocalSQLContext.init(TestSQLContext.scala:29)
   at 
 org.apache.spark.sql.test.TestSQLContext$.init(TestSQLContext.scala:55)
   at 
 org.apache.spark.sql.test.TestSQLContext$.clinit(TestSQLContext.scala)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:274)
   at 
 scala.reflect.runtime.JavaMirrors$JavaMirror.javaClass(JavaMirrors.scala:500)
   at 
 scala.reflect.runtime.JavaMirrors$JavaMirror.tryJavaClass(JavaMirrors.scala:505)
   at 
 scala.reflect.runtime.SymbolLoaders$PackageScope.lookupEntry(SymbolLoaders.scala:109)
   at scala.reflect.internal.Types$Type.findMember(Types.scala:1185)
   at scala.reflect.internal.Types$Type.memberBasedOnName(Types.scala:722)
   at scala.reflect.internal.Types$Type.member(Types.scala:680)
   at 
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:43)
   at 
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:161)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:21)
   at 
 org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:72)
   at 
 org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:69)
   at 
 scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
   at 
 scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
   at 
 scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
   at 
 scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
   at 
 org.apache.spark.tools.GenerateMIMAIgnore$.privateWithin(GenerateMIMAIgnore.scala:69)
   at 
 org.apache.spark.tools.GenerateMIMAIgnore$.main(GenerateMIMAIgnore.scala:126)
   at 
 org.apache.spark.tools.GenerateMIMAIgnore.main(GenerateMIMAIgnore.scala)
 {code}
 Here, TestSQLContext's static initialization code is being run during MiMa 
 checks and that initialization creates a SparkContext.  Because

[jira] [Updated] (SPARK-8110) DAG visualizations sometimes look weird in Python

2015-06-04 Thread Matei Zaharia (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-8110:
-
Attachment: Screen Shot 2015-06-04 at 1.51.32 PM.png
Screen Shot 2015-06-04 at 1.51.35 PM.png

 DAG visualizations sometimes look weird in Python
 -

 Key: SPARK-8110
 URL: https://issues.apache.org/jira/browse/SPARK-8110
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.4.0
Reporter: Matei Zaharia
Priority: Minor
 Attachments: Screen Shot 2015-06-04 at 1.51.32 PM.png, Screen Shot 
 2015-06-04 at 1.51.35 PM.png


 Got this by doing sc.textFile(README.md).count() -- there are some RDDs 
 outside of any stages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8115) Remove TestData

Reynold Xin created SPARK-8115:
--

 Summary: Remove TestData
 Key: SPARK-8115
 URL: https://issues.apache.org/jira/browse/SPARK-8115
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Priority: Minor


TestData was from the era when we didn't have easy ways to generate test 
datasets. Now we have implicits on Seq + toDF, it'd make more sense to put the 
test datasets closer to the test suites.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8112) Received block event count through the StreamingListener can be negative


 [ 
https://issues.apache.org/jira/browse/SPARK-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8112:
---

Assignee: Apache Spark  (was: Shixiong Zhu)

 Received block event count through the StreamingListener can be negative
 

 Key: SPARK-8112
 URL: https://issues.apache.org/jira/browse/SPARK-8112
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Tathagata Das
Assignee: Apache Spark
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8112) Received block event count through the StreamingListener can be negative


 [ 
https://issues.apache.org/jira/browse/SPARK-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8112:
---

Assignee: Shixiong Zhu  (was: Apache Spark)

 Received block event count through the StreamingListener can be negative
 

 Key: SPARK-8112
 URL: https://issues.apache.org/jira/browse/SPARK-8112
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Tathagata Das
Assignee: Shixiong Zhu
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8112) Received block event count through the StreamingListener can be negative


[ 
https://issues.apache.org/jira/browse/SPARK-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573796#comment-14573796
 ] 

Apache Spark commented on SPARK-8112:
-

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/6659

 Received block event count through the StreamingListener can be negative
 

 Key: SPARK-8112
 URL: https://issues.apache.org/jira/browse/SPARK-8112
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Tathagata Das
Assignee: Shixiong Zhu
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7417) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies


 [ 
https://issues.apache.org/jira/browse/SPARK-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-7417:
-
Target Version/s: 1.3.2, 1.4.0  (was: 1.4.0)

 Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies
 ---

 Key: SPARK-7417
 URL: https://issues.apache.org/jira/browse/SPARK-7417
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Andrew Or
Priority: Critical
  Labels: flaky-test
 Fix For: 1.3.2, 1.4.0


 {code}
 Expected exception java.lang.RuntimeException to be thrown, but no exception 
 was thrown.
 {code}
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2201/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/neglects_Spark_and_Spark_s_dependencies/
 ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7418) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts


 [ 
https://issues.apache.org/jira/browse/SPARK-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-7418:
-
Target Version/s: 1.3.2, 1.4.0  (was: 1.4.0)

 Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts
 ---

 Key: SPARK-7418
 URL: https://issues.apache.org/jira/browse/SPARK-7418
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Andrew Or
Assignee: Burak Yavuz
Priority: Critical
 Fix For: 1.3.2, 1.4.0


 {code}
java.lang.RuntimeException: [unresolved dependency: 
 com.agimatec#agimatec-validation;0.9.3: not found]
   at 
 org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:931)
   at 
 org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply$mcV$sp(SparkSubmitUtilsSuite.scala:108)
   at 
 org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply(SparkSubmitUtilsSuite.scala:107)
   at 
 {code}
 https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/2075/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/search_for_artifact_at_other_repositories/
 ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7418) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts


 [ 
https://issues.apache.org/jira/browse/SPARK-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-7418:
-
Fix Version/s: 1.4.0
   1.3.2

 Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts
 ---

 Key: SPARK-7418
 URL: https://issues.apache.org/jira/browse/SPARK-7418
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Andrew Or
Assignee: Burak Yavuz
Priority: Critical
 Fix For: 1.3.2, 1.4.0


 {code}
java.lang.RuntimeException: [unresolved dependency: 
 com.agimatec#agimatec-validation;0.9.3: not found]
   at 
 org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:931)
   at 
 org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply$mcV$sp(SparkSubmitUtilsSuite.scala:108)
   at 
 org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply(SparkSubmitUtilsSuite.scala:107)
   at 
 {code}
 https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/2075/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/search_for_artifact_at_other_repositories/
 ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7417) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies


 [ 
https://issues.apache.org/jira/browse/SPARK-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-7417:
-
Fix Version/s: 1.4.0
   1.3.2

 Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies
 ---

 Key: SPARK-7417
 URL: https://issues.apache.org/jira/browse/SPARK-7417
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Andrew Or
Priority: Critical
  Labels: flaky-test
 Fix For: 1.3.2, 1.4.0


 {code}
 Expected exception java.lang.RuntimeException to be thrown, but no exception 
 was thrown.
 {code}
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2201/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/neglects_Spark_and_Spark_s_dependencies/
 ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7417) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies


[ 
https://issues.apache.org/jira/browse/SPARK-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573842#comment-14573842
 ] 

Andrew Or commented on SPARK-7417:
--

This should be resolved by:
branch-1.4+: 8014e1f6bb871d9fd4db74106eb4425d0c1e9dd6 (#5892)
branch-1.3: 5b96b6933a1c0f05512823117c8c66f4b44e2937 (#6657)

 Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite neglect dependencies
 ---

 Key: SPARK-7417
 URL: https://issues.apache.org/jira/browse/SPARK-7417
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Andrew Or
Assignee: Burak Yavuz
Priority: Critical
  Labels: flaky-test
 Fix For: 1.3.2, 1.4.0


 {code}
 Expected exception java.lang.RuntimeException to be thrown, but no exception 
 was thrown.
 {code}
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2201/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/neglects_Spark_and_Spark_s_dependencies/
 ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7418) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts


[ 
https://issues.apache.org/jira/browse/SPARK-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573840#comment-14573840
 ] 

Andrew Or commented on SPARK-7418:
--

This should be resolved by:

branch-1.4+: 8014e1f6bb871d9fd4db74106eb4425d0c1e9dd6 (#5892)
branch-1.3: 5b96b6933a1c0f05512823117c8c66f4b44e2937 (#6657)

 Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts
 ---

 Key: SPARK-7418
 URL: https://issues.apache.org/jira/browse/SPARK-7418
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Andrew Or
Priority: Critical

 {code}
java.lang.RuntimeException: [unresolved dependency: 
 com.agimatec#agimatec-validation;0.9.3: not found]
   at 
 org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:931)
   at 
 org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply$mcV$sp(SparkSubmitUtilsSuite.scala:108)
   at 
 org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply(SparkSubmitUtilsSuite.scala:107)
   at 
 {code}
 https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/2075/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/search_for_artifact_at_other_repositories/
 ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-7418) Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts


 [ 
https://issues.apache.org/jira/browse/SPARK-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-7418.

Resolution: Fixed
  Assignee: Burak Yavuz

 Flaky test: o.a.s.deploy.SparkSubmitUtilsSuite search for artifacts
 ---

 Key: SPARK-7418
 URL: https://issues.apache.org/jira/browse/SPARK-7418
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Andrew Or
Assignee: Burak Yavuz
Priority: Critical

 {code}
java.lang.RuntimeException: [unresolved dependency: 
 com.agimatec#agimatec-validation;0.9.3: not found]
   at 
 org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:931)
   at 
 org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply$mcV$sp(SparkSubmitUtilsSuite.scala:108)
   at 
 org.apache.spark.deploy.SparkSubmitUtilsSuite$$anonfun$5.apply(SparkSubmitUtilsSuite.scala:107)
   at 
 {code}
 https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/2075/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/junit/org.apache.spark.deploy/SparkSubmitUtilsSuite/search_for_artifact_at_other_repositories/
 ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7546) Example code for ML Pipelines feature transformations


 [ 
https://issues.apache.org/jira/browse/SPARK-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7546:
---

Assignee: Apache Spark  (was: Ram Sriharsha)

 Example code for ML Pipelines feature transformations
 -

 Key: SPARK-7546
 URL: https://issues.apache.org/jira/browse/SPARK-7546
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: Joseph K. Bradley
Assignee: Apache Spark

 This should be added for Scala, Java, and Python.
 It should cover ML Pipelines using a complex series of feature 
 transformations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7546) Example code for ML Pipelines feature transformations


[ 
https://issues.apache.org/jira/browse/SPARK-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573625#comment-14573625
 ] 

Apache Spark commented on SPARK-7546:
-

User 'harsha2010' has created a pull request for this issue:
https://github.com/apache/spark/pull/6654

 Example code for ML Pipelines feature transformations
 -

 Key: SPARK-7546
 URL: https://issues.apache.org/jira/browse/SPARK-7546
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: Joseph K. Bradley
Assignee: Ram Sriharsha

 This should be added for Scala, Java, and Python.
 It should cover ML Pipelines using a complex series of feature 
 transformations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8113) SQL module test cleanup

Reynold Xin created SPARK-8113:
--

 Summary: SQL module test cleanup
 Key: SPARK-8113
 URL: https://issues.apache.org/jira/browse/SPARK-8113
 Project: Spark
  Issue Type: Umbrella
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Minor


Some cleanup tasks to track here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6419) GenerateOrdering does not support BinaryType and complex types.


 [ 
https://issues.apache.org/jira/browse/SPARK-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-6419:
-

Assignee: Davies Liu

 GenerateOrdering does not support BinaryType and complex types.
 ---

 Key: SPARK-6419
 URL: https://issues.apache.org/jira/browse/SPARK-6419
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Yin Huai
Assignee: Davies Liu

 When user want to order by binary columns or columns with complex types and 
 code gen is enabled, there will be a MatchError ([see 
 here|https://github.com/apache/spark/blob/v1.3.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala#L45]).
  We can either add supports for these types or have a function to check if we 
 can safely call GenerateOrdering (like the canBeCodeGened for HashAggregation 
 Strategy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8117) Push codegen into Expression

Davies Liu created SPARK-8117:
-

 Summary: Push codegen into Expression
 Key: SPARK-8117
 URL: https://issues.apache.org/jira/browse/SPARK-8117
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Davies Liu


Push the codegen implementation of expression into Expression itself, make it 
easy to manage and extend.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7184) Investigate turning codegen on by default


 [ 
https://issues.apache.org/jira/browse/SPARK-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-7184:
-

Assignee: Davies Liu

 Investigate turning codegen on by default
 -

 Key: SPARK-7184
 URL: https://issues.apache.org/jira/browse/SPARK-7184
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Davies Liu

 If it is not the default, users get suboptimal performance out of the box, 
 and the codegen path falls behind the interpreted path over time.
 The best option might be to have only the codegen path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7536) Audit MLlib Python API for 1.4


[ 
https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573535#comment-14573535
 ] 

Yanbo Liang commented on SPARK-7536:


[~josephkb] I'm in business travel during 1st June to 10th June, so there will 
be no update during this period.

 Audit MLlib Python API for 1.4
 --

 Key: SPARK-7536
 URL: https://issues.apache.org/jira/browse/SPARK-7536
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Joseph K. Bradley
Assignee: Yanbo Liang

 For new public APIs added to MLlib, we need to check the generated HTML doc 
 and compare the Scala  Python versions.  We need to track:
 * Inconsistency: Do class/method/parameter names match? SPARK-7667
 * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
 be as complete as the Scala doc. SPARK-7666
 * API breaking changes: These should be very rare but are occasionally either 
 necessary (intentional) or accidental.  These must be recorded and added in 
 the Migration Guide for this release. SPARK-7665
 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
 component, please note that as well.
 * Missing classes/methods/parameters: We should create to-do JIRAs for 
 functionality missing from Python.
 ** classification
 *** StreamingLogisticRegressionWithSGD SPARK-7633
 ** clustering
 *** GaussianMixture SPARK-6258
 *** LDA SPARK-6259
 *** Power Iteration Clustering SPARK-5962
 *** StreamingKMeans SPARK-4118 
 ** evaluation
 *** MultilabelMetrics SPARK-6094 
 ** feature
 *** ElementwiseProduct SPARK-7605
 *** PCA SPARK-7604
 ** linalg
 *** Distributed linear algebra SPARK-6100
 ** pmml.export SPARK-7638
 ** regression
 *** StreamingLinearRegressionWithSGD SPARK-4127
 ** stat
 *** KernelDensity SPARK-7639
 ** util
 *** MLUtils SPARK-6263 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7536) Audit MLlib Python API for 1.4


[ 
https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573537#comment-14573537
 ] 

Yanbo Liang commented on SPARK-7536:


[~josephkb] I'm in business travel during 1st June to 10th June, so there will 
be no update during this period.

 Audit MLlib Python API for 1.4
 --

 Key: SPARK-7536
 URL: https://issues.apache.org/jira/browse/SPARK-7536
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Joseph K. Bradley
Assignee: Yanbo Liang

 For new public APIs added to MLlib, we need to check the generated HTML doc 
 and compare the Scala  Python versions.  We need to track:
 * Inconsistency: Do class/method/parameter names match? SPARK-7667
 * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
 be as complete as the Scala doc. SPARK-7666
 * API breaking changes: These should be very rare but are occasionally either 
 necessary (intentional) or accidental.  These must be recorded and added in 
 the Migration Guide for this release. SPARK-7665
 ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
 component, please note that as well.
 * Missing classes/methods/parameters: We should create to-do JIRAs for 
 functionality missing from Python.
 ** classification
 *** StreamingLogisticRegressionWithSGD SPARK-7633
 ** clustering
 *** GaussianMixture SPARK-6258
 *** LDA SPARK-6259
 *** Power Iteration Clustering SPARK-5962
 *** StreamingKMeans SPARK-4118 
 ** evaluation
 *** MultilabelMetrics SPARK-6094 
 ** feature
 *** ElementwiseProduct SPARK-7605
 *** PCA SPARK-7604
 ** linalg
 *** Distributed linear algebra SPARK-6100
 ** pmml.export SPARK-7638
 ** regression
 *** StreamingLinearRegressionWithSGD SPARK-4127
 ** stat
 *** KernelDensity SPARK-7639
 ** util
 *** MLUtils SPARK-6263 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8080) Custom Receiver.store with Iterator type do not give correct count at Spark UI


[ 
https://issues.apache.org/jira/browse/SPARK-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573566#comment-14573566
 ] 

Tathagata Das commented on SPARK-8080:
--

[~zsxwing] Take a look at the screenshot attached to the JIRA. We should not be 
showing negative numbers in the input size. I am guessing that this is 
happening because the num of records reported by ReceivedBlockInfo is -1 (to 
signify lack of information), which gets added up to become -4). This should 
not happen I am filing a separate JIRA for this, can you take a look at the 
issue?

 Custom Receiver.store with Iterator type do not give correct count at Spark UI
 --

 Key: SPARK-8080
 URL: https://issues.apache.org/jira/browse/SPARK-8080
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.0
Reporter: Dibyendu Bhattacharya
 Fix For: 1.4.0

 Attachments: screenshot.png


 In Custom receiver if I call store with Iterator type (store(dataIterator: 
 Iterator[T]): Unit ) , Spark UI does not show the correct count of records in 
 block which leads to wrong value for Input Rate, Scheduling Delay and Input 
 SIze. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8068) Add confusionMatrix method at class MulticlassMetrics in pyspark/mllib

2015-06-04 Thread Ai He (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573676#comment-14573676
 ] 

Ai He commented on SPARK-8068:
--

Hi, Joseph, am I supposed to solve this issue or just let the assignee of 
SPARK-7536 resolve all related issues?

 Add confusionMatrix method at class MulticlassMetrics in pyspark/mllib
 --

 Key: SPARK-8068
 URL: https://issues.apache.org/jira/browse/SPARK-8068
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 1.3.1
Reporter: Ai He
Priority: Minor

 There is no confusionMatrix method at class MulticlassMetrics in 
 pyspark/mllib. This method is actually implemented in scala mllib. To achieve 
 this, we just need add a function call to the corresponding one in scala 
 mllib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8098) Show correct length of bytes on log page


 [ 
https://issues.apache.org/jira/browse/SPARK-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-8098.
--
   Resolution: Fixed
Fix Version/s: 1.5.0
   1.4.1
   1.3.2

 Show correct length of bytes on log page
 

 Key: SPARK-8098
 URL: https://issues.apache.org/jira/browse/SPARK-8098
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.3.1
Reporter: Carson Wang
Priority: Minor
 Fix For: 1.3.2, 1.4.1, 1.5.0


 The log page should only show desired length of bytes. Currently it shows 
 bytes from the startIndex to the end of the file. The Next button on the 
 page is always disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8110) DAG visualizations sometimes look weird in Python

2015-06-04 Thread Matei Zaharia (JIRA)

Matei Zaharia created SPARK-8110:


 Summary: DAG visualizations sometimes look weird in Python
 Key: SPARK-8110
 URL: https://issues.apache.org/jira/browse/SPARK-8110
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.4.0
Reporter: Matei Zaharia
Priority: Minor


Got this by doing sc.textFile(README.md).count() -- there are some RDDs 
outside of any stages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8112) Received block event count through the StreamingListener can be negative