[jira] [Updated] (SPARK-9480) Create an map abstract class MapData and a default implementation backed by 2 ArrayData

2015-08-01 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-9480:
---
Parent Issue: SPARK-9413  (was: SPARK-9389)

> Create an map abstract class MapData and a default implementation backed by 2 
> ArrayData
> ---
>
> Key: SPARK-9480
> URL: https://issues.apache.org/jira/browse/SPARK-9480
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9480) Create an map abstract class MapData and a default implementation backed by 2 ArrayData

2015-08-01 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-9480.

   Resolution: Fixed
 Assignee: Wenchen Fan
Fix Version/s: 1.5.0

> Create an map abstract class MapData and a default implementation backed by 2 
> ArrayData
> ---
>
> Key: SPARK-9480
> URL: https://issues.apache.org/jira/browse/SPARK-9480
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8887) Explicitly define which data types can be used as dynamic partition columns

2015-08-01 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650196#comment-14650196
 ] 

Reynold Xin commented on SPARK-8887:


[~lian cheng] can we put this in 1.5?

> Explicitly define which data types can be used as dynamic partition columns
> ---
>
> Key: SPARK-8887
> URL: https://issues.apache.org/jira/browse/SPARK-8887
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Cheng Lian
>
> {{InsertIntoHadoopFsRelation}} implements Hive compatible dynamic 
> partitioning insertion, which uses {{String.valueOf}} to write encode 
> partition column values into dynamic partition directories. This actually 
> limits the data types that can be used in partition column. For example, 
> string representation of {{StructType}} values is not well defined. However, 
> this limitation is not explicitly enforced.
> There are several things we can improve:
> # Enforce dynamic column data type requirements by adding analysis rules and 
> throws {{AnalysisException}} when violation occurs.
> # Abstract away string representation of various data types, so that we don't 
> need to convert internal representation types (e.g. {{UTF8String}}) to 
> external types (e.g. {{String}}). A set of Hive compatible implementations 
> should be provided to ensure compatibility with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9520) UnsafeFixedWidthAggregationMap should support in-place sorting of its own records

2015-08-01 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-9520:
--

 Summary: UnsafeFixedWidthAggregationMap should support in-place 
sorting of its own records
 Key: SPARK-9520
 URL: https://issues.apache.org/jira/browse/SPARK-9520
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


In order to support sort-based external aggregation fallback, 
UnsafeFixedWidthAggregationMap needs to support sorting all of its records 
in-place.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9520) UnsafeFixedWidthAggregationMap should support in-place sorting of its own records

2015-08-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650201#comment-14650201
 ] 

Apache Spark commented on SPARK-9520:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7849

> UnsafeFixedWidthAggregationMap should support in-place sorting of its own 
> records
> -
>
> Key: SPARK-9520
> URL: https://issues.apache.org/jira/browse/SPARK-9520
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> In order to support sort-based external aggregation fallback, 
> UnsafeFixedWidthAggregationMap needs to support sorting all of its records 
> in-place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9520) UnsafeFixedWidthAggregationMap should support in-place sorting of its own records

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9520:
---

Assignee: Reynold Xin  (was: Apache Spark)

> UnsafeFixedWidthAggregationMap should support in-place sorting of its own 
> records
> -
>
> Key: SPARK-9520
> URL: https://issues.apache.org/jira/browse/SPARK-9520
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> In order to support sort-based external aggregation fallback, 
> UnsafeFixedWidthAggregationMap needs to support sorting all of its records 
> in-place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8269) string function: initcap

2015-08-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650200#comment-14650200
 ] 

Apache Spark commented on SPARK-8269:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/7850

> string function: initcap
> 
>
> Key: SPARK-8269
> URL: https://issues.apache.org/jira/browse/SPARK-8269
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Cheng Hao
>
> initcap(string A): string
> Returns string, with the first letter of each word in uppercase, all other 
> letters in lowercase. Words are delimited by whitespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9520) UnsafeFixedWidthAggregationMap should support in-place sorting of its own records

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9520:
---

Assignee: Apache Spark  (was: Reynold Xin)

> UnsafeFixedWidthAggregationMap should support in-place sorting of its own 
> records
> -
>
> Key: SPARK-9520
> URL: https://issues.apache.org/jira/browse/SPARK-9520
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> In order to support sort-based external aggregation fallback, 
> UnsafeFixedWidthAggregationMap needs to support sorting all of its records 
> in-place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8232) complex function: sort_array

2015-08-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650206#comment-14650206
 ] 

Apache Spark commented on SPARK-8232:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/7851

> complex function: sort_array
> 
>
> Key: SPARK-8232
> URL: https://issues.apache.org/jira/browse/SPARK-8232
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Cheng Hao
> Fix For: 1.5.0
>
>
> sort_array(Array)
> Sorts the input array in ascending order according to the natural ordering of 
> the array elements and returns it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7446) Inverse transform for StringIndexer

2015-08-01 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-7446.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 6339
[https://github.com/apache/spark/pull/6339]

> Inverse transform for StringIndexer
> ---
>
> Key: SPARK-7446
> URL: https://issues.apache.org/jira/browse/SPARK-7446
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>Assignee: holdenk
>Priority: Minor
> Fix For: 1.5.0
>
>
> It is useful to convert the encoded indices back to their string 
> representation for result inspection. We can add a parameter to 
> StringIndexer/StringIndexModel for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8873) Support cleaning up shuffle files when using shuffle service in Mesos

2015-08-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8873:
-
Priority: Blocker  (was: Critical)

> Support cleaning up shuffle files when using shuffle service in Mesos
> -
>
> Key: SPARK-8873
> URL: https://issues.apache.org/jira/browse/SPARK-8873
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 1.2.0
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>Priority: Blocker
>  Labels: mesos
>
> With dynamic allocation enabled with Mesos, drivers can launch with shuffle 
> data cached in the external shuffle service.
> However, there is no reliable way to let the shuffle service clean up the 
> shuffle data when the driver exits, since it may crash before it notifies the 
> shuffle service and shuffle data will be cached forever.
> We need to implement a reliable way to detect driver termination and clean up 
> shuffle data accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9521) Require Maven 3.3.3+ in the build

2015-08-01 Thread Sean Owen (JIRA)
Sean Owen created SPARK-9521:


 Summary: Require Maven 3.3.3+ in the build
 Key: SPARK-9521
 URL: https://issues.apache.org/jira/browse/SPARK-9521
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 1.4.1
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Trivial


Patrick recently discovered a build problem that manifested because he was 
using the Maven 3.2.x installed on his system, and which was resolved by using 
Maven 3.3.x. Since we have a script that can install Maven 3.3.3 for anyone, it 
probably makes sense to just enforce use of Maven 3.3.3+ in the build. 
(Currently it's just 3.0.4+).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9521) Require Maven 3.3.3+ in the build

2015-08-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650210#comment-14650210
 ] 

Apache Spark commented on SPARK-9521:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/7852

> Require Maven 3.3.3+ in the build
> -
>
> Key: SPARK-9521
> URL: https://issues.apache.org/jira/browse/SPARK-9521
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.4.1
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Trivial
>
> Patrick recently discovered a build problem that manifested because he was 
> using the Maven 3.2.x installed on his system, and which was resolved by 
> using Maven 3.3.x. Since we have a script that can install Maven 3.3.3 for 
> anyone, it probably makes sense to just enforce use of Maven 3.3.3+ in the 
> build. (Currently it's just 3.0.4+).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9521) Require Maven 3.3.3+ in the build

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9521:
---

Assignee: Apache Spark  (was: Sean Owen)

> Require Maven 3.3.3+ in the build
> -
>
> Key: SPARK-9521
> URL: https://issues.apache.org/jira/browse/SPARK-9521
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.4.1
>Reporter: Sean Owen
>Assignee: Apache Spark
>Priority: Trivial
>
> Patrick recently discovered a build problem that manifested because he was 
> using the Maven 3.2.x installed on his system, and which was resolved by 
> using Maven 3.3.x. Since we have a script that can install Maven 3.3.3 for 
> anyone, it probably makes sense to just enforce use of Maven 3.3.3+ in the 
> build. (Currently it's just 3.0.4+).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9522) SparkSubmit process can not exit if kill application when HiveThriftServer was starting

2015-08-01 Thread Weizhong (JIRA)
Weizhong created SPARK-9522:
---

 Summary: SparkSubmit process can not exit if kill application when 
HiveThriftServer was starting
 Key: SPARK-9522
 URL: https://issues.apache.org/jira/browse/SPARK-9522
 Project: Spark
  Issue Type: Improvement
Reporter: Weizhong
Priority: Minor


When we start HiveThriftServer, we will start SparkContext first, then start 
HiveServer2, if we kill application while HiveServer2 is starting then 
SparkContext will stop successfully, but SparkSubmit process can not exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9521) Require Maven 3.3.3+ in the build

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9521:
---

Assignee: Sean Owen  (was: Apache Spark)

> Require Maven 3.3.3+ in the build
> -
>
> Key: SPARK-9521
> URL: https://issues.apache.org/jira/browse/SPARK-9521
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.4.1
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Trivial
>
> Patrick recently discovered a build problem that manifested because he was 
> using the Maven 3.2.x installed on his system, and which was resolved by 
> using Maven 3.3.x. Since we have a script that can install Maven 3.3.3 for 
> anyone, it probably makes sense to just enforce use of Maven 3.3.3+ in the 
> build. (Currently it's just 3.0.4+).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9522) SparkSubmit process can not exit if kill application when HiveThriftServer was starting

2015-08-01 Thread Weizhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weizhong updated SPARK-9522:

Component/s: SQL

> SparkSubmit process can not exit if kill application when HiveThriftServer 
> was starting
> ---
>
> Key: SPARK-9522
> URL: https://issues.apache.org/jira/browse/SPARK-9522
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Weizhong
>Priority: Minor
>
> When we start HiveThriftServer, we will start SparkContext first, then start 
> HiveServer2, if we kill application while HiveServer2 is starting then 
> SparkContext will stop successfully, but SparkSubmit process can not exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9522) SparkSubmit process can not exit if kill application when HiveThriftServer was starting

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9522:
---

Assignee: Apache Spark

> SparkSubmit process can not exit if kill application when HiveThriftServer 
> was starting
> ---
>
> Key: SPARK-9522
> URL: https://issues.apache.org/jira/browse/SPARK-9522
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Weizhong
>Assignee: Apache Spark
>Priority: Minor
>
> When we start HiveThriftServer, we will start SparkContext first, then start 
> HiveServer2, if we kill application while HiveServer2 is starting then 
> SparkContext will stop successfully, but SparkSubmit process can not exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9522) SparkSubmit process can not exit if kill application when HiveThriftServer was starting

2015-08-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650212#comment-14650212
 ] 

Apache Spark commented on SPARK-9522:
-

User 'Sephiroth-Lin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7853

> SparkSubmit process can not exit if kill application when HiveThriftServer 
> was starting
> ---
>
> Key: SPARK-9522
> URL: https://issues.apache.org/jira/browse/SPARK-9522
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Weizhong
>Priority: Minor
>
> When we start HiveThriftServer, we will start SparkContext first, then start 
> HiveServer2, if we kill application while HiveServer2 is starting then 
> SparkContext will stop successfully, but SparkSubmit process can not exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9522) SparkSubmit process can not exit if kill application when HiveThriftServer was starting

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9522:
---

Assignee: (was: Apache Spark)

> SparkSubmit process can not exit if kill application when HiveThriftServer 
> was starting
> ---
>
> Key: SPARK-9522
> URL: https://issues.apache.org/jira/browse/SPARK-9522
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Weizhong
>Priority: Minor
>
> When we start HiveThriftServer, we will start SparkContext first, then start 
> HiveServer2, if we kill application while HiveServer2 is starting then 
> SparkContext will stop successfully, but SparkSubmit process can not exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8999) Support non-temporal sequence in PrefixSpan

2015-08-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8999:
-
Assignee: Zhang JiaJin

> Support non-temporal sequence in PrefixSpan
> ---
>
> Key: SPARK-8999
> URL: https://issues.apache.org/jira/browse/SPARK-8999
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Assignee: Zhang JiaJin
>Priority: Critical
> Fix For: 1.5.0
>
>
> In SPARK-6487, we assume that all items are ordered. However, we should 
> support non-temporal sequences in PrefixSpan. This should be done before 1.5 
> because it changes PrefixSpan APIs.
> We can use `Array[Array[Int]]` or follow SPMF to use `Array[Int]` and use -1 
> to mark itemset boundaries. The latter is more efficient for storage. If we 
> support generic item type, we can use null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8999) Support non-temporal sequence in PrefixSpan

2015-08-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-8999.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7818
[https://github.com/apache/spark/pull/7818]

> Support non-temporal sequence in PrefixSpan
> ---
>
> Key: SPARK-8999
> URL: https://issues.apache.org/jira/browse/SPARK-8999
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Priority: Critical
> Fix For: 1.5.0
>
>
> In SPARK-6487, we assume that all items are ordered. However, we should 
> support non-temporal sequences in PrefixSpan. This should be done before 1.5 
> because it changes PrefixSpan APIs.
> We can use `Array[Array[Int]]` or follow SPMF to use `Array[Int]` and use -1 
> to mark itemset boundaries. The latter is more efficient for storage. If we 
> support generic item type, we can use null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8505) Add settings to kick `lint-r` from `./dev/run-test.py`

2015-08-01 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650233#comment-14650233
 ] 

Yu Ishikawa commented on SPARK-8505:


[~srowen]  Yes, I acknowledge how it is assigned, but I thought it would be 
better to show my activity to the other developers. I would be careful next 
time. Thanks!

> Add settings to kick `lint-r` from `./dev/run-test.py`
> --
>
> Key: SPARK-8505
> URL: https://issues.apache.org/jira/browse/SPARK-8505
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Yu Ishikawa
>
> Add some settings to kick `lint-r` script from `./dev/run-test.py`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8169) Add StopWordsRemover as a transformer

2015-08-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-8169.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 6742
[https://github.com/apache/spark/pull/6742]

> Add StopWordsRemover as a transformer
> -
>
> Key: SPARK-8169
> URL: https://issues.apache.org/jira/browse/SPARK-8169
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Assignee: yuhao yang
> Fix For: 1.5.0
>
>
> StopWordsRemover takes a string array column and outputs a string array 
> column with all defined stop words removed. The transformer should also come 
> with a standard set of stop words as default.
> {code}
> val stopWords = new StopWordsRemover()
>   .setInputCol("words")
>   .setOutputCol("cleanWords")
>   .setStopWords(Array(...)) // optional
> val output = stopWords.transform(df)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9523) Receiver for Spark Streaming does not naturally support kryo serializer

2015-08-01 Thread John Chen (JIRA)
John Chen created SPARK-9523:


 Summary: Receiver for Spark Streaming does not naturally support 
kryo serializer
 Key: SPARK-9523
 URL: https://issues.apache.org/jira/browse/SPARK-9523
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.3.0
 Environment: Windows 7 local mode
Reporter: John Chen
 Fix For: 1.3.2, 1.4.2


In some cases, some attributes in a class is not serializable, which you still 
want to use after serialization of the whole object, you'll have to customize 
your serialization codes. For example, you can declare those attributes as 
transient, which makes them ignored during serialization, and then you can 
reassign their values during deserialization.

Now, if you're using Java serialization, you'll have to implement Serializable, 
and write those codes in readObject() and writeObejct() methods; And if you're 
using kryo serialization, you'll have to implement KryoSerializable, and write 
these codes in read() and write() methods.

In Spark and Spark Streaming, you can set kryo as the serializer for speeding 
up. However, the functions taken by RDD or DStream operations are still 
serialized by Java serialization, which means you only need to write those 
custom serialization codes in readObject() and writeObejct() methods.

But when it comes to Spark Streaming's Receiver, things are different. When you 
wish to customize an InputDStream, you must extend the Receiver. However, it 
turns out, the Receiver will be serialized by kryo if you set kryo serializer 
in SparkConf, and will fall back to Java serialization if you didn't.

So here's comes the problems, if you want to change the serializer by 
configuration and make sure the Receiver runs perfectly for both Java and kryo, 
you'll have to write all the 4 methods above. First, it is redundant, since 
you'll have to write serialization/deserialization code almost twice; Secondly, 
there's nothing in the doc or in the code to inform users to implement the 
KryoSerializable interface. 

Since all other function parameters are serialized by Java only, I suggest you 
also make it so for the Receiver. It may be slower, but since the serialization 
will only be executed for each interval, it's durable. More importantly, it can 
cause fewer trouble



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9523) Receiver for Spark Streaming does not naturally support kryo serializer

2015-08-01 Thread John Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Chen updated SPARK-9523:
-
Affects Version/s: (was: 1.3.0)
   1.3.1

The issue occurs in 1.3.1, not tested in 1.4.0 or 1.4.1. However, the codes for 
Receiver in these versions seems identical.

> Receiver for Spark Streaming does not naturally support kryo serializer
> ---
>
> Key: SPARK-9523
> URL: https://issues.apache.org/jira/browse/SPARK-9523
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.3.1
> Environment: Windows 7 local mode
>Reporter: John Chen
>  Labels: kryo, serialization
> Fix For: 1.3.2, 1.4.2
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> In some cases, some attributes in a class is not serializable, which you 
> still want to use after serialization of the whole object, you'll have to 
> customize your serialization codes. For example, you can declare those 
> attributes as transient, which makes them ignored during serialization, and 
> then you can reassign their values during deserialization.
> Now, if you're using Java serialization, you'll have to implement 
> Serializable, and write those codes in readObject() and writeObejct() 
> methods; And if you're using kryo serialization, you'll have to implement 
> KryoSerializable, and write these codes in read() and write() methods.
> In Spark and Spark Streaming, you can set kryo as the serializer for speeding 
> up. However, the functions taken by RDD or DStream operations are still 
> serialized by Java serialization, which means you only need to write those 
> custom serialization codes in readObject() and writeObejct() methods.
> But when it comes to Spark Streaming's Receiver, things are different. When 
> you wish to customize an InputDStream, you must extend the Receiver. However, 
> it turns out, the Receiver will be serialized by kryo if you set kryo 
> serializer in SparkConf, and will fall back to Java serialization if you 
> didn't.
> So here's comes the problems, if you want to change the serializer by 
> configuration and make sure the Receiver runs perfectly for both Java and 
> kryo, you'll have to write all the 4 methods above. First, it is redundant, 
> since you'll have to write serialization/deserialization code almost twice; 
> Secondly, there's nothing in the doc or in the code to inform users to 
> implement the KryoSerializable interface. 
> Since all other function parameters are serialized by Java only, I suggest 
> you also make it so for the Receiver. It may be slower, but since the 
> serialization will only be executed for each interval, it's durable. More 
> importantly, it can cause fewer trouble



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8505) Add settings to kick `lint-r` from `./dev/run-test.py`

2015-08-01 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650262#comment-14650262
 ] 

Sean Owen commented on SPARK-8505:
--

If you open a pull request, the JIRA is marked as "In Progress" and links to 
your PR. That pretty clearly shows your activity. 

> Add settings to kick `lint-r` from `./dev/run-test.py`
> --
>
> Key: SPARK-8505
> URL: https://issues.apache.org/jira/browse/SPARK-8505
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Yu Ishikawa
>
> Add some settings to kick `lint-r` script from `./dev/run-test.py`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6873) Some Hive-Catalyst comparison tests fail due to unimportant order of some printed elements

2015-08-01 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650268#comment-14650268
 ] 

Cheng Lian commented on SPARK-6873:
---

It's not important. Internally, Hive just traverses a hash map and dumps 
everything in it. So the order is decided by the implementation of the hash map.

> Some Hive-Catalyst comparison tests fail due to unimportant order of some 
> printed elements
> --
>
> Key: SPARK-6873
> URL: https://issues.apache.org/jira/browse/SPARK-6873
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.3.1
>Reporter: Sean Owen
>Assignee: Cheng Lian
>Priority: Minor
>
> As I mentioned, I've been seeing 4 test failures in Hive tests for a while, 
> and actually it still affects master. I think it's a superficial problem that 
> only turns up when running on Java 8, but still, would probably be an easy 
> fix and good to fix.
> Specifically, here are four tests and the bit that fails the comparison, 
> below. I tried to diagnose this but had trouble even finding where some of 
> this occurs, like the list of synonyms?
> {code}
> - show_tblproperties *** FAILED ***
>   Results do not match for show_tblproperties:
> ...
>   !== HIVE - 2 row(s) ==   == CATALYST - 2 row(s) ==
>   !tmptruebar bar value
>   !barbar value   tmp true (HiveComparisonTest.scala:391)
> {code}
> {code}
> - show_create_table_serde *** FAILED ***
>   Results do not match for show_create_table_serde:
> ...
>WITH SERDEPROPERTIES (  WITH 
> SERDEPROPERTIES ( 
>   !  'serialization.format'='$', 
> 'field.delim'=',', 
>   !  'field.delim'=',')  
> 'serialization.format'='$')
> {code}
> {code}
> - udf_std *** FAILED ***
>   Results do not match for udf_std:
> ...
>   !== HIVE - 2 row(s) == == CATALYST 
> - 2 row(s) ==
>std(x) - Returns the standard deviation of a set of numbers   std(x) - 
> Returns the standard deviation of a set of numbers
>   !Synonyms: stddev_pop, stddev  Synonyms: 
> stddev, stddev_pop (HiveComparisonTest.scala:391)
> {code}
> {code}
> - udf_stddev *** FAILED ***
>   Results do not match for udf_stddev:
> ...
>   !== HIVE - 2 row(s) ==== 
> CATALYST - 2 row(s) ==
>stddev(x) - Returns the standard deviation of a set of numbers   stddev(x) 
> - Returns the standard deviation of a set of numbers
>   !Synonyms: stddev_pop, stdSynonyms: 
> std, stddev_pop (HiveComparisonTest.scala:391)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-1902) Spark shell prints error when :4040 port already in use

2015-08-01 Thread Eugene Morozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Morozov updated SPARK-1902:
--
Comment: was deleted

(was: It looks like package name has changed since and now log4j.properties has 
to have another logger name to turn it off:
{noformat}
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
{noformat}

I'm not sure what should I do:
1. Reopen this issue
2. Create a new one
3. Or it's not that important to make this change.

Please, suggest.)

> Spark shell prints error when :4040 port already in use
> ---
>
> Key: SPARK-1902
> URL: https://issues.apache.org/jira/browse/SPARK-1902
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Ash
>Assignee: Andrew Ash
> Fix For: 1.1.0
>
>
> When running two shells on the same machine, I get the below error.  The 
> issue is that the first shell takes port 4040, then the next tries tries 4040 
> and fails so falls back to 4041, then a third would try 4040 and 4041 before 
> landing on 4042, etc.
> We should catch the error and instead log as "Unable to use port 4041; 
> already in use.  Attempting port 4042..."
> {noformat}
> 14/05/22 11:31:54 WARN component.AbstractLifeCycle: FAILED 
> SelectChannelConnector@0.0.0.0:4041: java.net.BindException: Address already 
> in use
> java.net.BindException: Address already in use
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:444)
> at sun.nio.ch.Net.bind(Net.java:436)
> at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at 
> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
> at 
> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
> at 
> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
> at org.eclipse.jetty.server.Server.doStart(Server.java:293)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
> at 
> org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:192)
> at 
> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:192)
> at 
> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:192)
> at scala.util.Try$.apply(Try.scala:161)
> at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:191)
> at 
> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:205)
> at org.apache.spark.ui.WebUI.bind(WebUI.scala:99)
> at org.apache.spark.SparkContext.(SparkContext.scala:217)
> at 
> org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:957)
> at $line3.$read$$iwC$$iwC.(:8)
> at $line3.$read$$iwC.(:14)
> at $line3.$read.(:16)
> at $line3.$read$.(:20)
> at $line3.$read$.()
> at $line3.$eval$.(:7)
> at $line3.$eval$.()
> at $line3.$eval.$print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
> at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
> at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
> at 
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
> at 
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
> at 
> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:121)
> at 
> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:120)
> at 
> org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:263)
> at 
> org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:120)
> at 
> org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:56)
>

[jira] [Comment Edited] (SPARK-9000) Support generic item type in PrefixSpan

2015-08-01 Thread Masaki Rikitoku (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650301#comment-14650301
 ] 

Masaki Rikitoku edited comment on SPARK-9000 at 8/1/15 12:33 PM:
-

Hi Xiangrui Meng 

Thanks for your comments.

I agree with you and feynmanliang because my modification for this ticket is 
very tiny. 

If I notice something about feynmanliang's pr, I will inform you. 


was (Author: rikima):
Hi Xiangrui Meng 

Thanks for your comments.

I agree with you and feynmanliang because my modification for this ticket is 
very tiny. 

If I notice something about feynmanliang's pr I will inform you. 

> Support generic item type in PrefixSpan
> ---
>
> Key: SPARK-9000
> URL: https://issues.apache.org/jira/browse/SPARK-9000
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Priority: Critical
>
> In SPARK-6487, we only support Int type. It requires users to encode other 
> types into integer to use PrefixSpan. We should be able to do this inside 
> PrefixSpan, similar to FPGrowth. This should be done before 1.5 since it 
> changes APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9000) Support generic item type in PrefixSpan

2015-08-01 Thread Masaki Rikitoku (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650301#comment-14650301
 ] 

Masaki Rikitoku commented on SPARK-9000:


Hi Xiangrui Meng 

Thanks for your comments.

I agree with you and feynmanliang because my modification for this ticket is 
very tiny. 

If I notice something about feynmanliang's pr I will inform you. 

> Support generic item type in PrefixSpan
> ---
>
> Key: SPARK-9000
> URL: https://issues.apache.org/jira/browse/SPARK-9000
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Priority: Critical
>
> In SPARK-6487, we only support Int type. It requires users to encode other 
> types into integer to use PrefixSpan. We should be able to do this inside 
> PrefixSpan, similar to FPGrowth. This should be done before 1.5 since it 
> changes APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9514) Add EventHubsReceiver to support Spark Streaming using Azure EventHubs

2015-08-01 Thread Nan Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650308#comment-14650308
 ] 

Nan Zhu commented on SPARK-9514:


I think the best way to do it is to add a new component in external directory, 
if we ensure that the code is maintained in long term...

> Add EventHubsReceiver to support Spark Streaming using Azure EventHubs
> --
>
> Key: SPARK-9514
> URL: https://issues.apache.org/jira/browse/SPARK-9514
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.4.1
>Reporter: shanyu zhao
> Fix For: 1.5.0
>
>
> We need to add EventHubsReceiver implementation to support Spark Streaming 
> applications that receive data from Azure EventHubs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6227) PCA and SVD for PySpark

2015-08-01 Thread Manoj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650314#comment-14650314
 ] 

Manoj Kumar commented on SPARK-6227:


[~mengxr] Can this be assigned to me? Since the blockmatrix PR is already 
worked on.

> PCA and SVD for PySpark
> ---
>
> Key: SPARK-6227
> URL: https://issues.apache.org/jira/browse/SPARK-6227
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Affects Versions: 1.2.1
>Reporter: Julien Amelot
>
> The Dimensionality Reduction techniques are not available via Python (Scala + 
> Java only).
> * Principal component analysis (PCA)
> * Singular value decomposition (SVD)
> Doc:
> http://spark.apache.org/docs/1.2.1/mllib-dimensionality-reduction.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9524) Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark)

2015-08-01 Thread Samuel Marks (JIRA)
Samuel Marks created SPARK-9524:
---

 Summary: Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) 
breaks (pyspark)
 Key: SPARK-9524
 URL: https://issues.apache.org/jira/browse/SPARK-9524
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.5.0
 Environment: Ubuntu 15.04
Linux Kudu 3.19.0-25-generic #26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015 x86_64 
x86_64 x86_64 GNU/Linux
Reporter: Samuel Marks
Priority: Blocker


I start my ipython notebook like usual, after updating to the latest Spark 
(`git pull`). Also tried a complete folder removal + clone + `build/mvn 
-DskipTests clean package` just to be sure.

I get a bunch of these 404 errors then this:

{code:none}
[W 00:13:49.462 NotebookApp] 404 GET 
/api/kernels/e7db54cb-f7bb-4bdf-8d0c-76110f26c12c/channels?session_id=64C70A32AA2940808FDCA038A3D9E5B5
 (127.0.0.1) 3.72ms referer=None
2.4+ kernel w/o ELF notes? -- report this
{code}

PS: None of my Python code works within `ipython notebook` when it's launched 
via pyspark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9525) Optimize SparseVector initializations in linalg

2015-08-01 Thread Manoj Kumar (JIRA)
Manoj Kumar created SPARK-9525:
--

 Summary: Optimize SparseVector initializations in linalg
 Key: SPARK-9525
 URL: https://issues.apache.org/jira/browse/SPARK-9525
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, PySpark
Reporter: Manoj Kumar
Priority: Minor


1. Remove sorting of indices and assume that the user gives a sorted tuple of 
indices, values etc

2. Avoid iterating twice to get the indices and values if the argument provided 
is a dict.

3. Add checks such that the length of the indices should be less than the size 
provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9524) Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark)

2015-08-01 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9524:
-
Affects Version/s: (was: 1.5.0)
 Priority: Major  (was: Blocker)

[~SamuelMarks] please read 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark first. 
Don't set Blocker, for example; 1.5.0 can't be the affected version.

> Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark)
> 
>
> Key: SPARK-9524
> URL: https://issues.apache.org/jira/browse/SPARK-9524
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: Ubuntu 15.04
> Linux Kudu 3.19.0-25-generic #26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015 
> x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Samuel Marks
>
> I start my ipython notebook like usual, after updating to the latest Spark 
> (`git pull`). Also tried a complete folder removal + clone + `build/mvn 
> -DskipTests clean package` just to be sure.
> I get a bunch of these 404 errors then this:
> {code:none}
> [W 00:13:49.462 NotebookApp] 404 GET 
> /api/kernels/e7db54cb-f7bb-4bdf-8d0c-76110f26c12c/channels?session_id=64C70A32AA2940808FDCA038A3D9E5B5
>  (127.0.0.1) 3.72ms referer=None
> 2.4+ kernel w/o ELF notes? -- report this
> {code}
> PS: None of my Python code works within `ipython notebook` when it's launched 
> via pyspark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9524) Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark)

2015-08-01 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-9524.
--
Resolution: Invalid

This doesn't appear to be related to Spark. At least none of the error here 
shows anything from Pyspark.

> Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark)
> 
>
> Key: SPARK-9524
> URL: https://issues.apache.org/jira/browse/SPARK-9524
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: Ubuntu 15.04
> Linux Kudu 3.19.0-25-generic #26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015 
> x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Samuel Marks
>
> I start my ipython notebook like usual, after updating to the latest Spark 
> (`git pull`). Also tried a complete folder removal + clone + `build/mvn 
> -DskipTests clean package` just to be sure.
> I get a bunch of these 404 errors then this:
> {code:none}
> [W 00:13:49.462 NotebookApp] 404 GET 
> /api/kernels/e7db54cb-f7bb-4bdf-8d0c-76110f26c12c/channels?session_id=64C70A32AA2940808FDCA038A3D9E5B5
>  (127.0.0.1) 3.72ms referer=None
> 2.4+ kernel w/o ELF notes? -- report this
> {code}
> PS: None of my Python code works within `ipython notebook` when it's launched 
> via pyspark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9525) Optimize SparseVector initializations in linalg

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9525:
---

Assignee: Apache Spark

> Optimize SparseVector initializations in linalg
> ---
>
> Key: SPARK-9525
> URL: https://issues.apache.org/jira/browse/SPARK-9525
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Reporter: Manoj Kumar
>Assignee: Apache Spark
>Priority: Minor
>
> 1. Remove sorting of indices and assume that the user gives a sorted tuple of 
> indices, values etc
> 2. Avoid iterating twice to get the indices and values if the argument 
> provided is a dict.
> 3. Add checks such that the length of the indices should be less than the 
> size provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9525) Optimize SparseVector initializations in linalg

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9525:
---

Assignee: (was: Apache Spark)

> Optimize SparseVector initializations in linalg
> ---
>
> Key: SPARK-9525
> URL: https://issues.apache.org/jira/browse/SPARK-9525
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Reporter: Manoj Kumar
>Priority: Minor
>
> 1. Remove sorting of indices and assume that the user gives a sorted tuple of 
> indices, values etc
> 2. Avoid iterating twice to get the indices and values if the argument 
> provided is a dict.
> 3. Add checks such that the length of the indices should be less than the 
> size provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9525) Optimize SparseVector initializations in linalg

2015-08-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650392#comment-14650392
 ] 

Apache Spark commented on SPARK-9525:
-

User 'MechCoder' has created a pull request for this issue:
https://github.com/apache/spark/pull/7854

> Optimize SparseVector initializations in linalg
> ---
>
> Key: SPARK-9525
> URL: https://issues.apache.org/jira/browse/SPARK-9525
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Reporter: Manoj Kumar
>Priority: Minor
>
> 1. Remove sorting of indices and assume that the user gives a sorted tuple of 
> indices, values etc
> 2. Avoid iterating twice to get the indices and values if the argument 
> provided is a dict.
> 3. Add checks such that the length of the indices should be less than the 
> size provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9525) Optimize SparseVector initializations in linalg

2015-08-01 Thread Manoj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Kumar updated SPARK-9525:
---
Priority: Major  (was: Minor)

> Optimize SparseVector initializations in linalg
> ---
>
> Key: SPARK-9525
> URL: https://issues.apache.org/jira/browse/SPARK-9525
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Reporter: Manoj Kumar
>
> 1. Remove sorting of indices and assume that the user gives a sorted tuple of 
> indices, values etc
> 2. Avoid iterating twice to get the indices and values if the argument 
> provided is a dict.
> 3. Add checks such that the length of the indices should be less than the 
> size provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8263) string function: substr/substring should also support binary type

2015-08-01 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-8263.
---
  Resolution: Fixed
   Fix Version/s: 1.5.0
Target Version/s:   (was: )

Issue resolved by pull request 7848
[https://github.com/apache/spark/pull/7848]

> string function: substr/substring should also support binary type
> -
>
> Key: SPARK-8263
> URL: https://issues.apache.org/jira/browse/SPARK-8263
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Cheng Hao
>Priority: Minor
> Fix For: 1.5.0
>
>
> See Hive's: 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9524) Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark)

2015-08-01 Thread Samuel Marks (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650413#comment-14650413
 ] 

Samuel Marks commented on SPARK-9524:
-

You're welcome to close the issue, it was only reported because it said: "2.4+ 
kernel w/o ELF notes? -- report this".

Working from the 1.4 branch and everything built fine + works fine, which means 
that it's unlikely to be an IPython issue.

Anyways.

> Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark)
> 
>
> Key: SPARK-9524
> URL: https://issues.apache.org/jira/browse/SPARK-9524
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: Ubuntu 15.04
> Linux Kudu 3.19.0-25-generic #26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015 
> x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Samuel Marks
>
> I start my ipython notebook like usual, after updating to the latest Spark 
> (`git pull`). Also tried a complete folder removal + clone + `build/mvn 
> -DskipTests clean package` just to be sure.
> I get a bunch of these 404 errors then this:
> {code:none}
> [W 00:13:49.462 NotebookApp] 404 GET 
> /api/kernels/e7db54cb-f7bb-4bdf-8d0c-76110f26c12c/channels?session_id=64C70A32AA2940808FDCA038A3D9E5B5
>  (127.0.0.1) 3.72ms referer=None
> 2.4+ kernel w/o ELF notes? -- report this
> {code}
> PS: None of my Python code works within `ipython notebook` when it's launched 
> via pyspark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9526) Utilize ScalaCheck to reveal potential bugs in sql expressions

2015-08-01 Thread Yijie Shen (JIRA)
Yijie Shen created SPARK-9526:
-

 Summary: Utilize ScalaCheck to reveal potential bugs in sql 
expressions
 Key: SPARK-9526
 URL: https://issues.apache.org/jira/browse/SPARK-9526
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yijie Shen
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8999) Support non-temporal sequence in PrefixSpan

2015-08-01 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650415#comment-14650415
 ] 

Xiangrui Meng commented on SPARK-8999:
--

[~srowen] Thanks for your feedback! PrefixSpan paper has ~2k citations and I 
can find implementations in many libraries, e.g., SPMF, R. I think it is fair 
to say the algorithm is popular in data mining. The question I had is whether 
we want to support sequences of itemsets instead of sequences of items. The 
former complicates both the API and the implementation. I asked the author of 
SPMF for advice. He said without itemset support it is called string mining, 
which should be efficiently handled by some other algorithms. So it seems that 
we should implement PrefixSpan as in the paper, which supports itemsets.

> Support non-temporal sequence in PrefixSpan
> ---
>
> Key: SPARK-8999
> URL: https://issues.apache.org/jira/browse/SPARK-8999
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Assignee: Zhang JiaJin
>Priority: Critical
> Fix For: 1.5.0
>
>
> In SPARK-6487, we assume that all items are ordered. However, we should 
> support non-temporal sequences in PrefixSpan. This should be done before 1.5 
> because it changes PrefixSpan APIs.
> We can use `Array[Array[Int]]` or follow SPMF to use `Array[Int]` and use -1 
> to mark itemset boundaries. The latter is more efficient for storage. If we 
> support generic item type, we can use null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9526) Utilize randomized testing to reveal potential bugs in sql expressions

2015-08-01 Thread Yijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yijie Shen updated SPARK-9526:
--
Summary: Utilize randomized testing to reveal potential bugs in sql 
expressions  (was: Utilize ScalaCheck to reveal potential bugs in sql 
expressions)

> Utilize randomized testing to reveal potential bugs in sql expressions
> --
>
> Key: SPARK-9526
> URL: https://issues.apache.org/jira/browse/SPARK-9526
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yijie Shen
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions

2015-08-01 Thread Yijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yijie Shen updated SPARK-9526:
--
Summary: Utilize randomized tests to reveal potential bugs in sql 
expressions  (was: Utilize randomized testing to reveal potential bugs in sql 
expressions)

> Utilize randomized tests to reveal potential bugs in sql expressions
> 
>
> Key: SPARK-9526
> URL: https://issues.apache.org/jira/browse/SPARK-9526
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yijie Shen
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD

2015-08-01 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-9527:


 Summary: PrefixSpan.run should return a PrefixSpanModel instead of 
an RDD
 Key: SPARK-9527
 URL: https://issues.apache.org/jira/browse/SPARK-9527
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Priority: Critical


With a model wrapping the result RDD, it would be more flexible to add features 
in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9526:
---

Assignee: Apache Spark

> Utilize randomized tests to reveal potential bugs in sql expressions
> 
>
> Key: SPARK-9526
> URL: https://issues.apache.org/jira/browse/SPARK-9526
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yijie Shen
>Assignee: Apache Spark
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions

2015-08-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650420#comment-14650420
 ] 

Apache Spark commented on SPARK-9526:
-

User 'yjshen' has created a pull request for this issue:
https://github.com/apache/spark/pull/7855

> Utilize randomized tests to reveal potential bugs in sql expressions
> 
>
> Key: SPARK-9526
> URL: https://issues.apache.org/jira/browse/SPARK-9526
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yijie Shen
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9526:
---

Assignee: (was: Apache Spark)

> Utilize randomized tests to reveal potential bugs in sql expressions
> 
>
> Key: SPARK-9526
> URL: https://issues.apache.org/jira/browse/SPARK-9526
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yijie Shen
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions

2015-08-01 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650421#comment-14650421
 ] 

Sean Owen commented on SPARK-9526:
--

[~yijieshen] at this point should we really be making new blockers for the 
1.5.0 release? the merge window has closed, technically, and this looks like 
just a small nice-to-have

> Utilize randomized tests to reveal potential bugs in sql expressions
> 
>
> Key: SPARK-9526
> URL: https://issues.apache.org/jira/browse/SPARK-9526
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yijie Shen
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions

2015-08-01 Thread Yijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650423#comment-14650423
 ] 

Yijie Shen commented on SPARK-9526:
---

[~srowen] thanks for reminding, the current randomised tests reveal some bugs 
in Spark SQL expression evaluation, so I think it might be a blocker. What do 
you think?

> Utilize randomized tests to reveal potential bugs in sql expressions
> 
>
> Key: SPARK-9526
> URL: https://issues.apache.org/jira/browse/SPARK-9526
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yijie Shen
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5754) Spark AM not launching on Windows

2015-08-01 Thread Carsten Blank (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650444#comment-14650444
 ] 

Carsten Blank commented on SPARK-5754:
--

Okay so I have thought about this more and kinda have a "qualified" opinion now.

I assume that you have fixed this issue for your problem? Did you write a 
separate escapeForShell for Windows? 

I have and I would like to suggest something like that for a PR. How did you 
solve this?

> Spark AM not launching on Windows
> -
>
> Key: SPARK-5754
> URL: https://issues.apache.org/jira/browse/SPARK-5754
> Project: Spark
>  Issue Type: Bug
>  Components: Windows, YARN
>Affects Versions: 1.1.1, 1.2.0
> Environment: Windows Server 2012, Hadoop 2.4.1.
>Reporter: Inigo
>
> I'm trying to run Spark Pi on a YARN cluster running on Windows and the AM 
> container fails to start. The problem seems to be in the generation of the 
> YARN command which adds single quotes (') surrounding some of the java 
> options. In particular, the part of the code that is adding those is the 
> escapeForShell function in YarnSparkHadoopUtil. Apparently, Windows does not 
> like the quotes for these options. Here is an example of the command that the 
> container tries to execute:
> @call %JAVA_HOME%/bin/java -server -Xmx512m -Djava.io.tmpdir=%PWD%/tmp 
> '-Dspark.yarn.secondary.jars=' 
> '-Dspark.app.name=org.apache.spark.examples.SparkPi' 
> '-Dspark.master=yarn-cluster' org.apache.spark.deploy.yarn.ApplicationMaster 
> --class 'org.apache.spark.examples.SparkPi' --jar  
> 'file:/D:/data/spark-1.1.1-bin-hadoop2.4/bin/../lib/spark-examples-1.1.1-hadoop2.4.0.jar'
>   --executor-memory 1024 --executor-cores 1 --num-executors 2
> Once I transform it into:
> @call %JAVA_HOME%/bin/java -server -Xmx512m -Djava.io.tmpdir=%PWD%/tmp 
> -Dspark.yarn.secondary.jars= 
> -Dspark.app.name=org.apache.spark.examples.SparkPi 
> -Dspark.master=yarn-cluster org.apache.spark.deploy.yarn.ApplicationMaster 
> --class 'org.apache.spark.examples.SparkPi' --jar  
> 'file:/D:/data/spark-1.1.1-bin-hadoop2.4/bin/../lib/spark-examples-1.1.1-hadoop2.4.0.jar'
>   --executor-memory 1024 --executor-cores 1 --num-executors 2
> Everything seems to start.
> How should I deal with this? Creating a separate function like escapeForShell 
> for Windows and call it whenever I detect this is for Windows? Or should I 
> add some sanity check on YARN?
> I checked a little and there seems to be people that is able to run Spark on 
> YARN on Windows, so it might be something else. I didn't find anything 
> related on Jira either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9524) Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark)

2015-08-01 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650448#comment-14650448
 ] 

Sean Owen commented on SPARK-9524:
--

That's an error from ipython though, not Spark. It doesn't follow that it's a 
Spark issue just because ipython + Spark x doesn't exhibit whatever problem 
you're seeing. Who knows, but, since it's an ipython error I'd start there.

> Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark)
> 
>
> Key: SPARK-9524
> URL: https://issues.apache.org/jira/browse/SPARK-9524
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: Ubuntu 15.04
> Linux Kudu 3.19.0-25-generic #26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015 
> x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Samuel Marks
>
> I start my ipython notebook like usual, after updating to the latest Spark 
> (`git pull`). Also tried a complete folder removal + clone + `build/mvn 
> -DskipTests clean package` just to be sure.
> I get a bunch of these 404 errors then this:
> {code:none}
> [W 00:13:49.462 NotebookApp] 404 GET 
> /api/kernels/e7db54cb-f7bb-4bdf-8d0c-76110f26c12c/channels?session_id=64C70A32AA2940808FDCA038A3D9E5B5
>  (127.0.0.1) 3.72ms referer=None
> 2.4+ kernel w/o ELF notes? -- report this
> {code}
> PS: None of my Python code works within `ipython notebook` when it's launched 
> via pyspark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8069) Add support for cutoff to RandomForestClassifier

2015-08-01 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650477#comment-14650477
 ] 

holdenk commented on SPARK-8069:


So I was looking at 
https://docs.google.com/document/d/1nV6m7sqViHkEpawelq1S5_QLWWAouSlv81eiEEjKuJY/edit?pli=1#
 and I can't comment or edit the design document so I figured I'd write my 
notes here ([~josephkb] if you could give me comment permission on the document 
that would be great).

The document calls for only having thresholds for ProbabilisticClassifier but 
also up above discusses having an implementation for both, which one do we want 
to do?

> Add support for cutoff to RandomForestClassifier
> 
>
> Key: SPARK-8069
> URL: https://issues.apache.org/jira/browse/SPARK-8069
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: holdenk
>Assignee: holdenk
>Priority: Minor
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> Consider adding support for cutoffs similar to 
> http://cran.r-project.org/web/packages/randomForest/randomForest.pdf 
> (Joseph) I just wrote a [little design doc | 
> https://docs.google.com/document/d/1nV6m7sqViHkEpawelq1S5_QLWWAouSlv81eiEEjKuJY/edit?usp=sharing]
>  for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6873) Some Hive-Catalyst comparison tests fail due to unimportant order of some printed elements

2015-08-01 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650480#comment-14650480
 ] 

Sean Owen commented on SPARK-6873:
--

[~rxin] [~lian cheng] It's still a problem. Yes I'm sure it's just a test 
issue, not a problem with the code, but ideally the test must not rely on the 
ordering. Right now tests don't actually pass in Java 8 because of things like 
...

{code}
- show_create_table_serde *** FAILED ***
  Results do not match for show_create_table_serde:
  == Parsed Logical Plan ==
  HiveNativeCommand SHOW CREATE TABLE tmp_showcrt1
  
  == Analyzed Logical Plan ==
  result: string
  HiveNativeCommand SHOW CREATE TABLE tmp_showcrt1
  
  == Optimized Logical Plan ==
  HiveNativeCommand SHOW CREATE TABLE tmp_showcrt1
  
  == Physical Plan ==
  ExecutedCommand (HiveNativeCommand SHOW CREATE TABLE tmp_showcrt1)
  
  Code Generation: true
  == RDD ==
  result
  !== HIVE - 13 row(s) ==  == CATALYST 
- 13 row(s) ==
   CREATE EXTERNAL TABLE `tmp_showcrt1`(   CREATE 
EXTERNAL TABLE `tmp_showcrt1`(
 `key` string,   `key` 
string, 
 `value` boolean)`value` 
boolean)
   ROW FORMAT SERDEROW FORMAT 
SERDE 
 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'  
'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' 
   STORED BY   STORED BY 
 'org.apache.hadoop.hive.ql.metadata.DefaultStorageHandler'  
'org.apache.hadoop.hive.ql.metadata.DefaultStorageHandler' 
   WITH SERDEPROPERTIES (  WITH 
SERDEPROPERTIES ( 
  !  'serialization.format'='$', 
'field.delim'=',', 
  !  'field.delim'=',')  
'serialization.format'='$')
   LOCATIONLOCATION
 'tmp_showcrt1'
'tmp_showcrt1'
   TBLPROPERTIES ( 
TBLPROPERTIES ( (HiveComparisonTest.scala:397)
{code}

I build with {{-Pyarn -Phive}} from master.

> Some Hive-Catalyst comparison tests fail due to unimportant order of some 
> printed elements
> --
>
> Key: SPARK-6873
> URL: https://issues.apache.org/jira/browse/SPARK-6873
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.3.1
>Reporter: Sean Owen
>Assignee: Cheng Lian
>Priority: Minor
>
> As I mentioned, I've been seeing 4 test failures in Hive tests for a while, 
> and actually it still affects master. I think it's a superficial problem that 
> only turns up when running on Java 8, but still, would probably be an easy 
> fix and good to fix.
> Specifically, here are four tests and the bit that fails the comparison, 
> below. I tried to diagnose this but had trouble even finding where some of 
> this occurs, like the list of synonyms?
> {code}
> - show_tblproperties *** FAILED ***
>   Results do not match for show_tblproperties:
> ...
>   !== HIVE - 2 row(s) ==   == CATALYST - 2 row(s) ==
>   !tmptruebar bar value
>   !barbar value   tmp true (HiveComparisonTest.scala:391)
> {code}
> {code}
> - show_create_table_serde *** FAILED ***
>   Results do not match for show_create_table_serde:
> ...
>WITH SERDEPROPERTIES (  WITH 
> SERDEPROPERTIES ( 
>   !  'serialization.format'='$', 
> 'field.delim'=',', 
>   !  'field.delim'=',')  
> 'serialization.format'='$')
> {code}
> {code}
> - udf_std *** FAILED ***
>   Results do not match for udf_std:
> ...
>   !== HIVE - 2 row(s) == == CATALYST 
> - 2 row(s) ==
>std(x) - Returns the standard deviation of a set of numbers   std(x) - 
> Returns the standard deviation of a set of numbers
>   !Synonyms: stddev_pop, stddev  Synonyms: 
> stddev, stddev_pop (HiveComparisonTest.scala:391)
> {code}
> {code}
> - udf_stddev *** FAILED ***
>   Results do not match for udf_stddev:
> ...
>   !== HIVE - 2 row(s) ==== 
> CATALYST - 2 row(s) ==
>stddev(x) - Returns the standard deviation of a set of numbers   stddev(x) 
> - Returns the standard deviation of a set of numbers
>   !Synonyms: stddev_pop, stdSynonyms: 
> std, stddev_pop (HiveComparisonTest.scala:391)
> {code}



--
This message was sent by

[jira] [Commented] (SPARK-3166) Custom serialisers can't be shipped in application jars

2015-08-01 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650486#comment-14650486
 ] 

Josh Rosen commented on SPARK-3166:
---

Does anyone know if this is still an issue in newer Spark versions?

> Custom serialisers can't be shipped in application jars
> ---
>
> Key: SPARK-3166
> URL: https://issues.apache.org/jira/browse/SPARK-3166
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
>Reporter: Graham Dennis
>
> Spark cannot currently use a custom serialiser that is shipped with the 
> application jar. Trying to do this causes a java.lang.ClassNotFoundException 
> when trying to instantiate the custom serialiser in the Executor processes. 
> This occurs because Spark attempts to instantiate the custom serialiser 
> before the application jar has been shipped to the Executor process. A 
> reproduction of the problem is available here: 
> https://github.com/GrahamDennis/spark-custom-serialiser
> I've verified this problem in Spark 1.0.2, and Spark master and 1.1 branches 
> as of August 21, 2014.  This issue is related to SPARK-2878, and my fix for 
> that issue (https://github.com/apache/spark/pull/1890) also solves this.  My 
> pull request was not merged because it adds the user jar to the Executor 
> processes' class path at launch time.  Such a significant change was thought 
> by [~rxin] to require more QA, and should be considered for inclusion in 1.2 
> at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions

2015-08-01 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-9526:
---
Priority: Minor  (was: Blocker)

> Utilize randomized tests to reveal potential bugs in sql expressions
> 
>
> Key: SPARK-9526
> URL: https://issues.apache.org/jira/browse/SPARK-9526
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yijie Shen
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions

2015-08-01 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-9526:
---
Priority: Major  (was: Minor)

> Utilize randomized tests to reveal potential bugs in sql expressions
> 
>
> Key: SPARK-9526
> URL: https://issues.apache.org/jira/browse/SPARK-9526
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yijie Shen
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions

2015-08-01 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650498#comment-14650498
 ] 

Reynold Xin commented on SPARK-9526:


I downgraded it to major. While it is great to have (especially if it finds a 
lot of bugs that can help QA), I don't think this is a release blocker.

> Utilize randomized tests to reveal potential bugs in sql expressions
> 
>
> Key: SPARK-9526
> URL: https://issues.apache.org/jira/browse/SPARK-9526
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yijie Shen
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4751) Support dynamic allocation for standalone mode

2015-08-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4751.

   Resolution: Fixed
Fix Version/s: 1.5.0

> Support dynamic allocation for standalone mode
> --
>
> Key: SPARK-4751
> URL: https://issues.apache.org/jira/browse/SPARK-4751
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
> Fix For: 1.5.0
>
>
> This is equivalent to SPARK-3822 but for standalone mode.
> This is actually a very tricky issue because the scheduling mechanism in the 
> standalone Master uses different semantics. In standalone mode we allocate 
> resources based on cores. By default, an application will grab all the cores 
> in the cluster unless "spark.cores.max" is specified. Unfortunately, this 
> means an application could get executors of different sizes (in terms of 
> cores) if:
> 1) App 1 kills an executor
> 2) App 2, with "spark.cores.max" set, grabs a subset of cores on a worker
> 3) App 1 requests an executor
> In this case, the new executor that App 1 gets back will be smaller than the 
> rest and can execute fewer tasks in parallel. Further, standalone mode is 
> subject to the constraint that only one executor can be allocated on each 
> worker per application. As a result, it is rather meaningless to request new 
> executors if the existing ones are already spread out across all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8069) Add support for cutoff to RandomForestClassifier

2015-08-01 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650514#comment-14650514
 ] 

Joseph K. Bradley commented on SPARK-8069:
--

My final plan was to only have it for ProbabilisticClassifier.  That note about 
Classifier is out of date; I forgot to update it, but will now.

> Add support for cutoff to RandomForestClassifier
> 
>
> Key: SPARK-8069
> URL: https://issues.apache.org/jira/browse/SPARK-8069
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: holdenk
>Assignee: holdenk
>Priority: Minor
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> Consider adding support for cutoffs similar to 
> http://cran.r-project.org/web/packages/randomForest/randomForest.pdf 
> (Joseph) I just wrote a [little design doc | 
> https://docs.google.com/document/d/1nV6m7sqViHkEpawelq1S5_QLWWAouSlv81eiEEjKuJY/edit?usp=sharing]
>  for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9528) RandomForestClassifier should extend ProbabilisticClassifier

2015-08-01 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-9528:


 Summary: RandomForestClassifier should extend 
ProbabilisticClassifier
 Key: SPARK-9528
 URL: https://issues.apache.org/jira/browse/SPARK-9528
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Joseph K. Bradley
Assignee: Joseph K. Bradley


Now that DecisionTreeClassifier extends ProbabilisticClassifier, we can have 
RandomForestClassifier extends ProbabilisticClassifier as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9491) App running on secure YARN with no HBase config will hang

2015-08-01 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-9491:
--
 Assignee: Marcelo Vanzin
Affects Version/s: 1.4.0
 Target Version/s: 1.4.2, 1.5.0  (was: 1.5.0)
Fix Version/s: 1.5.0

> App running on secure YARN with no HBase config will hang
> -
>
> Key: SPARK-9491
> URL: https://issues.apache.org/jira/browse/SPARK-9491
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Because HBase may not be available, or the default config may be pointing at 
> the wrong information for HBase, the YARN backend may end up waiting forever 
> at this point:
> {noformat}
> "main" prio=10 tid=0x7f96c8016000 nid=0x1aa6 waiting on condition 
> [0x7f96cda96000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:443)
> at 
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:60)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1123)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1110)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1067)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:902)
> at 
> org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:78)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124)
> at 
> org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95)
> at 
> org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:73)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService$BlockingStub.getAuthenticationToken(AuthenticationProtos.java:4512)
> at 
> org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:86)
> at 
> org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:69)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.spark.deploy.yarn.Client$.obtainTokenForHBase(Client.scala:1299)
> at 
> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:270)
> {noformat}
> The code shouldn't try to fetch HBase delegation tokens when HBase is not 
> configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9491) App running on secure YARN with no HBase config will hang

2015-08-01 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-9491:
--
Fix Version/s: 1.4.2

> App running on secure YARN with no HBase config will hang
> -
>
> Key: SPARK-9491
> URL: https://issues.apache.org/jira/browse/SPARK-9491
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Blocker
> Fix For: 1.4.2, 1.5.0
>
>
> Because HBase may not be available, or the default config may be pointing at 
> the wrong information for HBase, the YARN backend may end up waiting forever 
> at this point:
> {noformat}
> "main" prio=10 tid=0x7f96c8016000 nid=0x1aa6 waiting on condition 
> [0x7f96cda96000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:443)
> at 
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:60)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1123)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1110)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1067)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:902)
> at 
> org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:78)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124)
> at 
> org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95)
> at 
> org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:73)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService$BlockingStub.getAuthenticationToken(AuthenticationProtos.java:4512)
> at 
> org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:86)
> at 
> org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:69)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.spark.deploy.yarn.Client$.obtainTokenForHBase(Client.scala:1299)
> at 
> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:270)
> {noformat}
> The code shouldn't try to fetch HBase delegation tokens when HBase is not 
> configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9491) App running on secure YARN with no HBase config will hang

2015-08-01 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-9491.
---
Resolution: Fixed

> App running on secure YARN with no HBase config will hang
> -
>
> Key: SPARK-9491
> URL: https://issues.apache.org/jira/browse/SPARK-9491
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Blocker
> Fix For: 1.4.2, 1.5.0
>
>
> Because HBase may not be available, or the default config may be pointing at 
> the wrong information for HBase, the YARN backend may end up waiting forever 
> at this point:
> {noformat}
> "main" prio=10 tid=0x7f96c8016000 nid=0x1aa6 waiting on condition 
> [0x7f96cda96000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:443)
> at 
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:60)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1123)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1110)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1067)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:902)
> at 
> org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:78)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124)
> at 
> org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95)
> at 
> org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:73)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService$BlockingStub.getAuthenticationToken(AuthenticationProtos.java:4512)
> at 
> org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:86)
> at 
> org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:69)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.spark.deploy.yarn.Client$.obtainTokenForHBase(Client.scala:1299)
> at 
> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:270)
> {noformat}
> The code shouldn't try to fetch HBase delegation tokens when HBase is not 
> configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9520) UnsafeFixedWidthAggregationMap should support in-place sorting of its own records

2015-08-01 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-9520.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7849
[https://github.com/apache/spark/pull/7849]

> UnsafeFixedWidthAggregationMap should support in-place sorting of its own 
> records
> -
>
> Key: SPARK-9520
> URL: https://issues.apache.org/jira/browse/SPARK-9520
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 1.5.0
>
>
> In order to support sort-based external aggregation fallback, 
> UnsafeFixedWidthAggregationMap needs to support sorting all of its records 
> in-place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9529) Improve sort on Decimal

2015-08-01 Thread Davies Liu (JIRA)
Davies Liu created SPARK-9529:
-

 Summary: Improve sort on Decimal
 Key: SPARK-9529
 URL: https://issues.apache.org/jira/browse/SPARK-9529
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Davies Liu
Assignee: Davies Liu
Priority: Critical


Right now, it's really slow, just hang there in random tests 

{code}
pool-1-thread-1-ScalaTest-running-TungstenSortSuite" prio=5 
tid=0x7f822bc82800 nid=0x5103 runnable [0x00011d1be000]
   java.lang.Thread.State: RUNNABLE
at java.math.BigInteger.(BigInteger.java:405)
at java.math.BigDecimal.bigTenToThe(BigDecimal.java:3380)
at java.math.BigDecimal.bigMultiplyPowerTen(BigDecimal.java:3508)
at java.math.BigDecimal.setScale(BigDecimal.java:2394)
at java.math.BigDecimal.divide(BigDecimal.java:1691)
at java.math.BigDecimal.divideToIntegralValue(BigDecimal.java:1734)
at java.math.BigDecimal.divideAndRemainder(BigDecimal.java:1891)
at java.math.BigDecimal.remainder(BigDecimal.java:1833)
at scala.math.BigDecimal.remainder(BigDecimal.scala:281)
at scala.math.BigDecimal.isWhole(BigDecimal.scala:215)
at scala.math.BigDecimal.hashCode(BigDecimal.scala:180)
at org.apache.spark.sql.types.Decimal.hashCode(Decimal.scala:260)
at 
org.apache.spark.sql.catalyst.InternalRow.hashCode(InternalRow.scala:121)
at org.apache.spark.RangePartitioner.hashCode(Partitioner.scala:201)
at java.lang.Object.toString(Object.java:237)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1418)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84)
at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
at 
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2003)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:683)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:682)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:682)
at 
org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:181)
at 
org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:148)
at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:148)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112)
at 
org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48)
at 
org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48)
at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
at org.apache.spark.sql.execution.Sort.doExecute(sort.scala:47)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112)
at 
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlanTest$.executePlan(SparkPlanTest.scala:297)
at 
org.apache.spark.sql.execution.SparkPlanTest$.checkAnswer(SparkPlanTest.scala:16

[jira] [Assigned] (SPARK-9483) UTF8String.getPrefix only works in little-endian order

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9483:
---

Assignee: Matthew Brandyberry  (was: Apache Spark)

> UTF8String.getPrefix only works in little-endian order
> --
>
> Key: SPARK-9483
> URL: https://issues.apache.org/jira/browse/SPARK-9483
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Matthew Brandyberry
>Priority: Critical
>
> There are 2 bit masking and a reverse bytes that should probably be handled 
> differently on big-endian order. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9495) Support prefix generation for date / timestamp data type

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9495:
---

Assignee: Apache Spark

> Support prefix generation for date / timestamp data type
> 
>
> Key: SPARK-9495
> URL: https://issues.apache.org/jira/browse/SPARK-9495
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> There are two files to change:
> SortPrefixUtils
> and
> SortPrefix (in SortOrder.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9483) UTF8String.getPrefix only works in little-endian order

2015-08-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650522#comment-14650522
 ] 

Apache Spark commented on SPARK-9483:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/7856

> UTF8String.getPrefix only works in little-endian order
> --
>
> Key: SPARK-9483
> URL: https://issues.apache.org/jira/browse/SPARK-9483
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Matthew Brandyberry
>Priority: Critical
>
> There are 2 bit masking and a reverse bytes that should probably be handled 
> differently on big-endian order. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9495) Support prefix generation for date / timestamp data type

2015-08-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650523#comment-14650523
 ] 

Apache Spark commented on SPARK-9495:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/7856

> Support prefix generation for date / timestamp data type
> 
>
> Key: SPARK-9495
> URL: https://issues.apache.org/jira/browse/SPARK-9495
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> There are two files to change:
> SortPrefixUtils
> and
> SortPrefix (in SortOrder.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9495) Support prefix generation for date / timestamp data type

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9495:
---

Assignee: (was: Apache Spark)

> Support prefix generation for date / timestamp data type
> 
>
> Key: SPARK-9495
> URL: https://issues.apache.org/jira/browse/SPARK-9495
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> There are two files to change:
> SortPrefixUtils
> and
> SortPrefix (in SortOrder.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9483) UTF8String.getPrefix only works in little-endian order

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9483:
---

Assignee: Apache Spark  (was: Matthew Brandyberry)

> UTF8String.getPrefix only works in little-endian order
> --
>
> Key: SPARK-9483
> URL: https://issues.apache.org/jira/browse/SPARK-9483
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>Priority: Critical
>
> There are 2 bit masking and a reverse bytes that should probably be handled 
> differently on big-endian order. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9530) ScalaDoc should not indicate LDAModel.descripeTopic and DistributedLDAModel.topDocumentsPerTopic as approximate.

2015-08-01 Thread Meihua Wu (JIRA)
Meihua Wu created SPARK-9530:


 Summary: ScalaDoc should not indicate LDAModel.descripeTopic and 
DistributedLDAModel.topDocumentsPerTopic as approximate.
 Key: SPARK-9530
 URL: https://issues.apache.org/jira/browse/SPARK-9530
 Project: Spark
  Issue Type: Documentation
  Components: MLlib
Affects Versions: 1.4.1, 1.4.0, 1.3.1, 1.3.0
Reporter: Meihua Wu
Priority: Minor


Currently the ScalaDoc for LDAModel.descripeTopic and 
DistributedLDAModel.topDocumentsPerTopic suggests that these methods are  
approximate. However, both methods are actually precise and there is no need to 
increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise set of 
top terms. 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9529) Improve sort on Decimal

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9529:
---

Assignee: Davies Liu  (was: Apache Spark)

> Improve sort on Decimal
> ---
>
> Key: SPARK-9529
> URL: https://issues.apache.org/jira/browse/SPARK-9529
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>Priority: Critical
>
> Right now, it's really slow, just hang there in random tests 
> {code}
> pool-1-thread-1-ScalaTest-running-TungstenSortSuite" prio=5 
> tid=0x7f822bc82800 nid=0x5103 runnable [0x00011d1be000]
>java.lang.Thread.State: RUNNABLE
>   at java.math.BigInteger.(BigInteger.java:405)
>   at java.math.BigDecimal.bigTenToThe(BigDecimal.java:3380)
>   at java.math.BigDecimal.bigMultiplyPowerTen(BigDecimal.java:3508)
>   at java.math.BigDecimal.setScale(BigDecimal.java:2394)
>   at java.math.BigDecimal.divide(BigDecimal.java:1691)
>   at java.math.BigDecimal.divideToIntegralValue(BigDecimal.java:1734)
>   at java.math.BigDecimal.divideAndRemainder(BigDecimal.java:1891)
>   at java.math.BigDecimal.remainder(BigDecimal.java:1833)
>   at scala.math.BigDecimal.remainder(BigDecimal.scala:281)
>   at scala.math.BigDecimal.isWhole(BigDecimal.scala:215)
>   at scala.math.BigDecimal.hashCode(BigDecimal.scala:180)
>   at org.apache.spark.sql.types.Decimal.hashCode(Decimal.scala:260)
>   at 
> org.apache.spark.sql.catalyst.InternalRow.hashCode(InternalRow.scala:121)
>   at org.apache.spark.RangePartitioner.hashCode(Partitioner.scala:201)
>   at java.lang.Object.toString(Object.java:237)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1418)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
>   at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84)
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
>   at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2003)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:683)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:682)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>   at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:682)
>   at 
> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:181)
>   at 
> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:148)
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
>   at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:148)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112)
>   at 
> org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48)
>   at 
> org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48)
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
>   at org.apache.spark.sql.execution.Sort.doExecute(sort.scala:47)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112)
>   at 
> org.a

[jira] [Commented] (SPARK-9529) Improve sort on Decimal

2015-08-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650540#comment-14650540
 ] 

Apache Spark commented on SPARK-9529:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/7857

> Improve sort on Decimal
> ---
>
> Key: SPARK-9529
> URL: https://issues.apache.org/jira/browse/SPARK-9529
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>Priority: Critical
>
> Right now, it's really slow, just hang there in random tests 
> {code}
> pool-1-thread-1-ScalaTest-running-TungstenSortSuite" prio=5 
> tid=0x7f822bc82800 nid=0x5103 runnable [0x00011d1be000]
>java.lang.Thread.State: RUNNABLE
>   at java.math.BigInteger.(BigInteger.java:405)
>   at java.math.BigDecimal.bigTenToThe(BigDecimal.java:3380)
>   at java.math.BigDecimal.bigMultiplyPowerTen(BigDecimal.java:3508)
>   at java.math.BigDecimal.setScale(BigDecimal.java:2394)
>   at java.math.BigDecimal.divide(BigDecimal.java:1691)
>   at java.math.BigDecimal.divideToIntegralValue(BigDecimal.java:1734)
>   at java.math.BigDecimal.divideAndRemainder(BigDecimal.java:1891)
>   at java.math.BigDecimal.remainder(BigDecimal.java:1833)
>   at scala.math.BigDecimal.remainder(BigDecimal.scala:281)
>   at scala.math.BigDecimal.isWhole(BigDecimal.scala:215)
>   at scala.math.BigDecimal.hashCode(BigDecimal.scala:180)
>   at org.apache.spark.sql.types.Decimal.hashCode(Decimal.scala:260)
>   at 
> org.apache.spark.sql.catalyst.InternalRow.hashCode(InternalRow.scala:121)
>   at org.apache.spark.RangePartitioner.hashCode(Partitioner.scala:201)
>   at java.lang.Object.toString(Object.java:237)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1418)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
>   at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84)
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
>   at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2003)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:683)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:682)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>   at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:682)
>   at 
> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:181)
>   at 
> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:148)
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
>   at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:148)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112)
>   at 
> org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48)
>   at 
> org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48)
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
>   at org.apache.spark.sql.execution.Sort.doExecute(sort.scala:47)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:1

[jira] [Updated] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.

2015-08-01 Thread Meihua Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meihua Wu updated SPARK-9530:
-
Summary: ScalaDoc should not indicate LDAModel.describeTopics and 
DistributedLDAModel.topDocumentsPerTopic as approximate.  (was: ScalaDoc should 
not indicate LDAModel.descripeTopic and 
DistributedLDAModel.topDocumentsPerTopic as approximate.)

> ScalaDoc should not indicate LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic as approximate.
> -
>
> Key: SPARK-9530
> URL: https://issues.apache.org/jira/browse/SPARK-9530
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1
>Reporter: Meihua Wu
>Priority: Minor
>
> Currently the ScalaDoc for LDAModel.descripeTopic and 
> DistributedLDAModel.topDocumentsPerTopic suggests that these methods are  
> approximate. However, both methods are actually precise and there is no need 
> to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise 
> set of top terms. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9529) Improve sort on Decimal

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9529:
---

Assignee: Apache Spark  (was: Davies Liu)

> Improve sort on Decimal
> ---
>
> Key: SPARK-9529
> URL: https://issues.apache.org/jira/browse/SPARK-9529
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Apache Spark
>Priority: Critical
>
> Right now, it's really slow, just hang there in random tests 
> {code}
> pool-1-thread-1-ScalaTest-running-TungstenSortSuite" prio=5 
> tid=0x7f822bc82800 nid=0x5103 runnable [0x00011d1be000]
>java.lang.Thread.State: RUNNABLE
>   at java.math.BigInteger.(BigInteger.java:405)
>   at java.math.BigDecimal.bigTenToThe(BigDecimal.java:3380)
>   at java.math.BigDecimal.bigMultiplyPowerTen(BigDecimal.java:3508)
>   at java.math.BigDecimal.setScale(BigDecimal.java:2394)
>   at java.math.BigDecimal.divide(BigDecimal.java:1691)
>   at java.math.BigDecimal.divideToIntegralValue(BigDecimal.java:1734)
>   at java.math.BigDecimal.divideAndRemainder(BigDecimal.java:1891)
>   at java.math.BigDecimal.remainder(BigDecimal.java:1833)
>   at scala.math.BigDecimal.remainder(BigDecimal.scala:281)
>   at scala.math.BigDecimal.isWhole(BigDecimal.scala:215)
>   at scala.math.BigDecimal.hashCode(BigDecimal.scala:180)
>   at org.apache.spark.sql.types.Decimal.hashCode(Decimal.scala:260)
>   at 
> org.apache.spark.sql.catalyst.InternalRow.hashCode(InternalRow.scala:121)
>   at org.apache.spark.RangePartitioner.hashCode(Partitioner.scala:201)
>   at java.lang.Object.toString(Object.java:237)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1418)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
>   at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84)
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
>   at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2003)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:683)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:682)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>   at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:682)
>   at 
> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:181)
>   at 
> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:148)
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
>   at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:148)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112)
>   at 
> org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48)
>   at 
> org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48)
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
>   at org.apache.spark.sql.execution.Sort.doExecute(sort.scala:47)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112)
>   at 
> org

[jira] [Updated] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.

2015-08-01 Thread Meihua Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meihua Wu updated SPARK-9530:
-
Description: 
Currently the ScalaDoc for LDAModel.describeTopics and 
DistributedLDAModel.topDocumentsPerTopic suggests that these methods are  
approximate. However, both methods are actually precise and there is no need to 
increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise set of 
top terms. 




  was:
Currently the ScalaDoc for LDAModel.descripeTopic and 
DistributedLDAModel.topDocumentsPerTopic suggests that these methods are  
approximate. However, both methods are actually precise and there is no need to 
increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise set of 
top terms. 





> ScalaDoc should not indicate LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic as approximate.
> -
>
> Key: SPARK-9530
> URL: https://issues.apache.org/jira/browse/SPARK-9530
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1
>Reporter: Meihua Wu
>Priority: Minor
>
> Currently the ScalaDoc for LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic suggests that these methods are  
> approximate. However, both methods are actually precise and there is no need 
> to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise 
> set of top terms. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9530:
---

Assignee: Apache Spark

> ScalaDoc should not indicate LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic as approximate.
> -
>
> Key: SPARK-9530
> URL: https://issues.apache.org/jira/browse/SPARK-9530
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1
>Reporter: Meihua Wu
>Assignee: Apache Spark
>Priority: Minor
>
> Currently the ScalaDoc for LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic suggests that these methods are  
> approximate. However, both methods are actually precise and there is no need 
> to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise 
> set of top terms. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.

2015-08-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650543#comment-14650543
 ] 

Apache Spark commented on SPARK-9530:
-

User 'rotationsymmetry' has created a pull request for this issue:
https://github.com/apache/spark/pull/7858

> ScalaDoc should not indicate LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic as approximate.
> -
>
> Key: SPARK-9530
> URL: https://issues.apache.org/jira/browse/SPARK-9530
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1
>Reporter: Meihua Wu
>Priority: Minor
>
> Currently the ScalaDoc for LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic suggests that these methods are  
> approximate. However, both methods are actually precise and there is no need 
> to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise 
> set of top terms. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9530:
---

Assignee: (was: Apache Spark)

> ScalaDoc should not indicate LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic as approximate.
> -
>
> Key: SPARK-9530
> URL: https://issues.apache.org/jira/browse/SPARK-9530
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1
>Reporter: Meihua Wu
>Priority: Minor
>
> Currently the ScalaDoc for LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic suggests that these methods are  
> approximate. However, both methods are actually precise and there is no need 
> to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise 
> set of top terms. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.

2015-08-01 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-9530:
-
Assignee: Meihua Wu
Target Version/s: 1.3.2, 1.4.2, 1.5.0

> ScalaDoc should not indicate LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic as approximate.
> -
>
> Key: SPARK-9530
> URL: https://issues.apache.org/jira/browse/SPARK-9530
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1
>Reporter: Meihua Wu
>Assignee: Meihua Wu
>Priority: Minor
>
> Currently the ScalaDoc for LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic suggests that these methods are  
> approximate. However, both methods are actually precise and there is no need 
> to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise 
> set of top terms. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-9492) LogisticRegression should provide model statistics

2015-08-01 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley closed SPARK-9492.

Resolution: Duplicate

> LogisticRegression should provide model statistics
> --
>
> Key: SPARK-9492
> URL: https://issues.apache.org/jira/browse/SPARK-9492
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Eric Liang
>
> Like ml LinearRegression, LogisticRegression should provide a training 
> summary including feature names and their coefficients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9492) LogisticRegression in R should provide model statistics

2015-08-01 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-9492:
-
Summary: LogisticRegression in R should provide model statistics  (was: 
LogisticRegression should provide model statistics)

> LogisticRegression in R should provide model statistics
> ---
>
> Key: SPARK-9492
> URL: https://issues.apache.org/jira/browse/SPARK-9492
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, R
>Reporter: Eric Liang
>
> Like ml LinearRegression, LogisticRegression should provide a training 
> summary including feature names and their coefficients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9492) LogisticRegression in R should provide model statistics

2015-08-01 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-9492:
-
Component/s: R

> LogisticRegression in R should provide model statistics
> ---
>
> Key: SPARK-9492
> URL: https://issues.apache.org/jira/browse/SPARK-9492
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, R
>Reporter: Eric Liang
>
> Like ml LinearRegression, LogisticRegression should provide a training 
> summary including feature names and their coefficients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-9492) LogisticRegression should provide model statistics

2015-08-01 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley reopened SPARK-9492:
--

Oops, I just realized this was for Spark R.  I'll add those tags.

> LogisticRegression should provide model statistics
> --
>
> Key: SPARK-9492
> URL: https://issues.apache.org/jira/browse/SPARK-9492
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, R
>Reporter: Eric Liang
>
> Like ml LinearRegression, LogisticRegression should provide a training 
> summary including feature names and their coefficients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5754) Spark AM not launching on Windows

2015-08-01 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650551#comment-14650551
 ] 

Inigo Goiri commented on SPARK-5754:


I overwrote the existing escapeForShell to use single quotes instead of double 
and I removed the "-XX:OnOutOfMemoryError='kill %p'" part in the command. This 
is just an internal solution for me but ideally this should check the OS and so 
on.

> Spark AM not launching on Windows
> -
>
> Key: SPARK-5754
> URL: https://issues.apache.org/jira/browse/SPARK-5754
> Project: Spark
>  Issue Type: Bug
>  Components: Windows, YARN
>Affects Versions: 1.1.1, 1.2.0
> Environment: Windows Server 2012, Hadoop 2.4.1.
>Reporter: Inigo
>
> I'm trying to run Spark Pi on a YARN cluster running on Windows and the AM 
> container fails to start. The problem seems to be in the generation of the 
> YARN command which adds single quotes (') surrounding some of the java 
> options. In particular, the part of the code that is adding those is the 
> escapeForShell function in YarnSparkHadoopUtil. Apparently, Windows does not 
> like the quotes for these options. Here is an example of the command that the 
> container tries to execute:
> @call %JAVA_HOME%/bin/java -server -Xmx512m -Djava.io.tmpdir=%PWD%/tmp 
> '-Dspark.yarn.secondary.jars=' 
> '-Dspark.app.name=org.apache.spark.examples.SparkPi' 
> '-Dspark.master=yarn-cluster' org.apache.spark.deploy.yarn.ApplicationMaster 
> --class 'org.apache.spark.examples.SparkPi' --jar  
> 'file:/D:/data/spark-1.1.1-bin-hadoop2.4/bin/../lib/spark-examples-1.1.1-hadoop2.4.0.jar'
>   --executor-memory 1024 --executor-cores 1 --num-executors 2
> Once I transform it into:
> @call %JAVA_HOME%/bin/java -server -Xmx512m -Djava.io.tmpdir=%PWD%/tmp 
> -Dspark.yarn.secondary.jars= 
> -Dspark.app.name=org.apache.spark.examples.SparkPi 
> -Dspark.master=yarn-cluster org.apache.spark.deploy.yarn.ApplicationMaster 
> --class 'org.apache.spark.examples.SparkPi' --jar  
> 'file:/D:/data/spark-1.1.1-bin-hadoop2.4/bin/../lib/spark-examples-1.1.1-hadoop2.4.0.jar'
>   --executor-memory 1024 --executor-cores 1 --num-executors 2
> Everything seems to start.
> How should I deal with this? Creating a separate function like escapeForShell 
> for Windows and call it whenever I detect this is for Windows? Or should I 
> add some sanity check on YARN?
> I checked a little and there seems to be people that is able to run Spark on 
> YARN on Windows, so it might be something else. I didn't find anything 
> related on Jira either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8333) Spark failed to delete temp directory created by HiveContext

2015-08-01 Thread Sudhakar Thota (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650555#comment-14650555
 ] 

Sudhakar Thota commented on SPARK-8333:
---

Thanks for the clarification. I have used the same statement you suggested to 
create HiveContext and was able to stop the sc without issues. After stopping 
without issues, I have validated by trying to use sqlContext as well as 
SparkContext. Please let me know if this is happening if you run it using a 
script and not from REPL.
Please take a look.

-
1. Creating HiveContext using hive, creating table, calling “sc.stop()” , 
verifying by trying to create a table again.

Sudhakars-MacBook-Pro-2:spark-1.4.0 sudhakarthota$ bin/spark-shell
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.4.0
  /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.

scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext: org.apache.spark.sql.hive.HiveContext = 
org.apache.spark.sql.hive.HiveContext@5ac35b17

scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS test1 (name STRING, rank INT) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'")
res0: org.apache.spark.sql.DataFrame = [result: string]

scala> sc.stop()

scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS test2 (name STRING, rank INT) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'")
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext
at 
org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:103)
at 
org.apache.spark.SparkContext$$anonfun$parallelize$1.apply(SparkContext.scala:696)
at 
org.apache.spark.SparkContext$$anonfun$parallelize$1.apply(SparkContext.scala:695)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:681)
at org.apache.spark.SparkContext.parallelize(SparkContext.scala:695)
at 
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:939)
at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:939)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:144)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:128)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:744)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:24)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:29)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:33)
at $iwC$$iwC$$iwC$$iwC.(:35)
at $iwC$$iwC$$iwC.(:37)
at $iwC$$iwC.(:39)
at $iwC.(:41)
at (:43)
at .(:47)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoo

[jira] [Assigned] (SPARK-9528) RandomForestClassifier should extend ProbabilisticClassifier

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9528:
---

Assignee: Apache Spark  (was: Joseph K. Bradley)

> RandomForestClassifier should extend ProbabilisticClassifier
> 
>
> Key: SPARK-9528
> URL: https://issues.apache.org/jira/browse/SPARK-9528
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>
> Now that DecisionTreeClassifier extends ProbabilisticClassifier, we can have 
> RandomForestClassifier extends ProbabilisticClassifier as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9528) RandomForestClassifier should extend ProbabilisticClassifier

2015-08-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650558#comment-14650558
 ] 

Apache Spark commented on SPARK-9528:
-

User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/7859

> RandomForestClassifier should extend ProbabilisticClassifier
> 
>
> Key: SPARK-9528
> URL: https://issues.apache.org/jira/browse/SPARK-9528
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>
> Now that DecisionTreeClassifier extends ProbabilisticClassifier, we can have 
> RandomForestClassifier extends ProbabilisticClassifier as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9528) RandomForestClassifier should extend ProbabilisticClassifier

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9528:
---

Assignee: Joseph K. Bradley  (was: Apache Spark)

> RandomForestClassifier should extend ProbabilisticClassifier
> 
>
> Key: SPARK-9528
> URL: https://issues.apache.org/jira/browse/SPARK-9528
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>
> Now that DecisionTreeClassifier extends ProbabilisticClassifier, we can have 
> RandomForestClassifier extends ProbabilisticClassifier as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.

2015-08-01 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-9530:
-
Target Version/s: 1.5.0  (was: 1.3.2, 1.4.2, 1.5.0)

> ScalaDoc should not indicate LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic as approximate.
> -
>
> Key: SPARK-9530
> URL: https://issues.apache.org/jira/browse/SPARK-9530
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1
>Reporter: Meihua Wu
>Assignee: Meihua Wu
>Priority: Minor
>
> Currently the ScalaDoc for LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic suggests that these methods are  
> approximate. However, both methods are actually precise and there is no need 
> to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise 
> set of top terms. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.

2015-08-01 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-9530.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7858
[https://github.com/apache/spark/pull/7858]

> ScalaDoc should not indicate LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic as approximate.
> -
>
> Key: SPARK-9530
> URL: https://issues.apache.org/jira/browse/SPARK-9530
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1
>Reporter: Meihua Wu
>Assignee: Meihua Wu
>Priority: Minor
> Fix For: 1.5.0
>
>
> Currently the ScalaDoc for LDAModel.describeTopics and 
> DistributedLDAModel.topDocumentsPerTopic suggests that these methods are  
> approximate. However, both methods are actually precise and there is no need 
> to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise 
> set of top terms. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9447) Update python API to include RandomForest as classifier changes.

2015-08-01 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650561#comment-14650561
 ] 

Joseph K. Bradley commented on SPARK-9447:
--

I'll do this once [SPARK-9528] gets fixed.

> Update python API to include RandomForest as classifier changes.
> 
>
> Key: SPARK-9447
> URL: https://issues.apache.org/jira/browse/SPARK-9447
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Reporter: holdenk
>
> The API should still work after 
> SPARK-9016-make-random-forest-classifiers-implement-classification-trait gets 
> merged in, but we might want to extend & provide predictRaw and similar in 
> the Python API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9531) UnsafeFixedWidthAggregationMap should be able to turn itself into an UnsafeKVExternalSorter

2015-08-01 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-9531:
--

 Summary: UnsafeFixedWidthAggregationMap should be able to turn 
itself into an UnsafeKVExternalSorter
 Key: SPARK-9531
 URL: https://issues.apache.org/jira/browse/SPARK-9531
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9531) UnsafeFixedWidthAggregationMap should be able to turn itself into an UnsafeKVExternalSorter

2015-08-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650562#comment-14650562
 ] 

Apache Spark commented on SPARK-9531:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7860

> UnsafeFixedWidthAggregationMap should be able to turn itself into an 
> UnsafeKVExternalSorter
> ---
>
> Key: SPARK-9531
> URL: https://issues.apache.org/jira/browse/SPARK-9531
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9531) UnsafeFixedWidthAggregationMap should be able to turn itself into an UnsafeKVExternalSorter

2015-08-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9531:
---

Assignee: Apache Spark  (was: Reynold Xin)

> UnsafeFixedWidthAggregationMap should be able to turn itself into an 
> UnsafeKVExternalSorter
> ---
>
> Key: SPARK-9531
> URL: https://issues.apache.org/jira/browse/SPARK-9531
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >