[jira] [Assigned] (SPARK-8505) Add settings to kick `lint-r` from `./dev/run-test.py`

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8505:
---

Assignee: Apache Spark

 Add settings to kick `lint-r` from `./dev/run-test.py`
 --

 Key: SPARK-8505
 URL: https://issues.apache.org/jira/browse/SPARK-8505
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Assignee: Apache Spark

 Add some settings to kick `lint-r` script from `./dev/run-test.py`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8505) Add settings to kick `lint-r` from `./dev/run-test.py`

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651360#comment-14651360
 ] 

Apache Spark commented on SPARK-8505:
-

User 'yu-iskw' has created a pull request for this issue:
https://github.com/apache/spark/pull/7883

 Add settings to kick `lint-r` from `./dev/run-test.py`
 --

 Key: SPARK-8505
 URL: https://issues.apache.org/jira/browse/SPARK-8505
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Add some settings to kick `lint-r` script from `./dev/run-test.py`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2205) Unnecessary exchange operators in a join on multiple tables with the same join key.

2015-08-02 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-2205.
-
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7773
[https://github.com/apache/spark/pull/7773]

 Unnecessary exchange operators in a join on multiple tables with the same 
 join key.
 ---

 Key: SPARK-2205
 URL: https://issues.apache.org/jira/browse/SPARK-2205
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Critical
 Fix For: 1.5.0


 {code}
 hql(select * from src x join src y on (x.key=y.key) join src z on 
 (y.key=z.key))
 SchemaRDD[1] at RDD at SchemaRDD.scala:100
 == Query Plan ==
 Project [key#4:0,value#5:1,key#6:2,value#7:3,key#8:4,value#9:5]
  HashJoin [key#6], [key#8], BuildRight
   Exchange (HashPartitioning [key#6], 200)
HashJoin [key#4], [key#6], BuildRight
 Exchange (HashPartitioning [key#4], 200)
  HiveTableScan [key#4,value#5], (MetastoreRelation default, src, 
 Some(x)), None
 Exchange (HashPartitioning [key#6], 200)
  HiveTableScan [key#6,value#7], (MetastoreRelation default, src, 
 Some(y)), None
   Exchange (HashPartitioning [key#8], 200)
HiveTableScan [key#8,value#9], (MetastoreRelation default, src, Some(z)), 
 None
 {code}
 However, this is fine...
 {code}
 hql(select * from src x join src y on (x.key=y.key) join src z on 
 (x.key=z.key))
 res5: org.apache.spark.sql.SchemaRDD = 
 SchemaRDD[5] at RDD at SchemaRDD.scala:100
 == Query Plan ==
 Project [key#26:0,value#27:1,key#28:2,value#29:3,key#30:4,value#31:5]
  HashJoin [key#26], [key#30], BuildRight
   HashJoin [key#26], [key#28], BuildRight
Exchange (HashPartitioning [key#26], 200)
 HiveTableScan [key#26,value#27], (MetastoreRelation default, src, 
 Some(x)), None
Exchange (HashPartitioning [key#28], 200)
 HiveTableScan [key#28,value#29], (MetastoreRelation default, src, 
 Some(y)), None
   Exchange (HashPartitioning [key#30], 200)
HiveTableScan [key#30,value#31], (MetastoreRelation default, src, 
 Some(z)), None
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7685) Handle high imbalanced data and apply weights to different samples in Logistic Regression

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651413#comment-14651413
 ] 

Apache Spark commented on SPARK-7685:
-

User 'dbtsai' has created a pull request for this issue:
https://github.com/apache/spark/pull/7884

 Handle high imbalanced data and apply weights to different samples in 
 Logistic Regression
 -

 Key: SPARK-7685
 URL: https://issues.apache.org/jira/browse/SPARK-7685
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: DB Tsai
Assignee: DB Tsai
Priority: Critical

 In fraud detection dataset, almost all the samples are negative while only 
 couple of them are positive. This type of high imbalanced data will bias the 
 models toward negative resulting poor performance. In python-scikit, they 
 provide a correction allowing users to Over-/undersample the samples of each 
 class according to the given weights. In auto mode, selects weights inversely 
 proportional to class frequencies in the training set. This can be done in a 
 more efficient way by multiplying the weights into loss and gradient instead 
 of doing actual over/undersampling in the training dataset which is very 
 expensive.
 http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
 On the other hand, some of the training data maybe more important like the 
 training samples from tenure users while the training samples from new users 
 maybe less important. We should be able to provide another weight: Double 
 information in the LabeledPoint to weight them differently in the learning 
 algorithm. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7685) Handle high imbalanced data and apply weights to different samples in Logistic Regression

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7685:
---

Assignee: Apache Spark  (was: DB Tsai)

 Handle high imbalanced data and apply weights to different samples in 
 Logistic Regression
 -

 Key: SPARK-7685
 URL: https://issues.apache.org/jira/browse/SPARK-7685
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: DB Tsai
Assignee: Apache Spark
Priority: Critical

 In fraud detection dataset, almost all the samples are negative while only 
 couple of them are positive. This type of high imbalanced data will bias the 
 models toward negative resulting poor performance. In python-scikit, they 
 provide a correction allowing users to Over-/undersample the samples of each 
 class according to the given weights. In auto mode, selects weights inversely 
 proportional to class frequencies in the training set. This can be done in a 
 more efficient way by multiplying the weights into loss and gradient instead 
 of doing actual over/undersampling in the training dataset which is very 
 expensive.
 http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
 On the other hand, some of the training data maybe more important like the 
 training samples from tenure users while the training samples from new users 
 maybe less important. We should be able to provide another weight: Double 
 information in the LabeledPoint to weight them differently in the learning 
 algorithm. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7685) Handle high imbalanced data and apply weights to different samples in Logistic Regression

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7685:
---

Assignee: DB Tsai  (was: Apache Spark)

 Handle high imbalanced data and apply weights to different samples in 
 Logistic Regression
 -

 Key: SPARK-7685
 URL: https://issues.apache.org/jira/browse/SPARK-7685
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: DB Tsai
Assignee: DB Tsai
Priority: Critical

 In fraud detection dataset, almost all the samples are negative while only 
 couple of them are positive. This type of high imbalanced data will bias the 
 models toward negative resulting poor performance. In python-scikit, they 
 provide a correction allowing users to Over-/undersample the samples of each 
 class according to the given weights. In auto mode, selects weights inversely 
 proportional to class frequencies in the training set. This can be done in a 
 more efficient way by multiplying the weights into loss and gradient instead 
 of doing actual over/undersampling in the training dataset which is very 
 expensive.
 http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
 On the other hand, some of the training data maybe more important like the 
 training samples from tenure users while the training samples from new users 
 maybe less important. We should be able to provide another weight: Double 
 information in the LabeledPoint to weight them differently in the learning 
 algorithm. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8505) Add settings to kick `lint-r` from `./dev/run-test.py`

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8505:
---

Assignee: (was: Apache Spark)

 Add settings to kick `lint-r` from `./dev/run-test.py`
 --

 Key: SPARK-8505
 URL: https://issues.apache.org/jira/browse/SPARK-8505
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Add some settings to kick `lint-r` script from `./dev/run-test.py`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9319) Add support for setting column names, types

2015-08-02 Thread Hossein Falaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651367#comment-14651367
 ] 

Hossein Falaki commented on SPARK-9319:
---

Yes. I will submit a PR.

 Add support for setting column names, types
 ---

 Key: SPARK-9319
 URL: https://issues.apache.org/jira/browse/SPARK-9319
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Shivaram Venkataraman

 This will help us support functions of the form 
 {code}
 colnames(data) - c(“Date”, “Arrival_Delay”)
 coltypes(data) - c(“numeric”, “logical”, “character”)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9550) Configuration renaming, defaults changes, and deprecation for 1.5.0 (master ticket)

2015-08-02 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-9550:
-

 Summary: Configuration renaming, defaults changes, and deprecation 
for 1.5.0 (master ticket)
 Key: SPARK-9550
 URL: https://issues.apache.org/jira/browse/SPARK-9550
 Project: Spark
  Issue Type: Task
  Components: Spark Core, SQL
Affects Versions: 1.5.0
Reporter: Josh Rosen
Priority: Blocker


This ticket tracks configurations which need to be renamed, deprecated, or have 
their defaults changed for Spark 1.5.0.

Note that subtasks / comments here do not necessarily need to reflect changes 
that must be performed.  Rather, tasks should be added here to make sure that 
the relevant configurations are at least checked before we cut releases.  This 
ticket will also help us to track configuration changes which must make it into 
the release notes.

*Configuration renaming*

- Consider renaming {{spark.shuffle.memoryFraction}} to 
{{spark.execution.memoryFraction}} 
([discussion|https://github.com/apache/spark/pull/7770#discussion-diff-36019144]).
- Rename all public-facing uses of {{unsafe}} to something less scary, such as 
{{tungsten}}

*Defaults changes*
- Codegen is now enabled by default.
- Tungsten is now enabled by default.

*Deprecation*
- Local execution has been removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8939) YARN EC2 default setting fails with IllegalArgumentException

2015-08-02 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651436#comment-14651436
 ] 

Shivaram Venkataraman commented on SPARK-8939:
--

[~andrewor14] I ran into this again today -- do you know where should we make a 
fix for this ? Is this in the Spark source code or can we just make a config 
option change in the EC2 scripts ?

 YARN EC2 default setting fails with IllegalArgumentException
 

 Key: SPARK-8939
 URL: https://issues.apache.org/jira/browse/SPARK-8939
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.5.0
Reporter: Andrew Or

 I just set it up from scratch using the spark-ec2 script. Then I ran
 {code}
 bin/spark-shell --master yarn
 {code}
 which failed with
 {code}
 15/07/09 03:44:29 ERROR SparkContext: Error initializing SparkContext.
 java.lang.IllegalArgumentException: Unknown/unsupported param 
 List(--num-executors, , --executor-memory, 6154m, --executor-memory, 6154m, 
 --executor-cores, 2, --name, Spark shell)
 {code}
 This goes away if I provide `--num-executors`, but we should fix the default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9208) Audit DataFrame expression API for 1.5 release

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650626#comment-14650626
 ] 

Apache Spark commented on SPARK-9208:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7861

 Audit DataFrame expression API for 1.5 release
 --

 Key: SPARK-9208
 URL: https://issues.apache.org/jira/browse/SPARK-9208
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Blocker

 This ticket makes sure I go through all new APIs added and audit them before 
 1.5.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9498) Some statistical information missed when the driver is out of the cluster

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-9498.
--
Resolution: Not A Problem

 Some statistical information missed when the driver is out of the cluster
 -

 Key: SPARK-9498
 URL: https://issues.apache.org/jira/browse/SPARK-9498
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.3.1, 1.4.0
Reporter: Liang Lee

 When an application is submited and the driver is out of the Spark cluster, 
 Some statistical information missed sometimes. 
 In stage detail inforamtion page, it will display following  info when the 
 driver is in the spark cluster:
 Details for Stage 7
 Total task time across all tasks: 37 min
  Input Size / Records: 55.8 GB / 60488
  Shuffle write: 26.6 GB / 585242962 
 But when the dreive is out of the spark cluster, it will sometimes  display 
 above info, while sometimes not, just like this:
 Details for Stage 7
 Total task time across all tasks: 37 min
 That is the Input Size and Shuffle data does not display.
 I have check the code and find that when the input size is zero then it will 
 not display. 
 And the  input size is sent by each Executors and collected by Driver. 
 The problem is that the data that should be repored by Executors, is missed. 
 But I don't know why. Could anyone help to solve this problem?
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9535) Modify document for codegen

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9535:
---

Assignee: Apache Spark

 Modify document for codegen
 ---

 Key: SPARK-9535
 URL: https://issues.apache.org/jira/browse/SPARK-9535
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, SQL
Affects Versions: 1.5.0
Reporter: Kousuke Saruta
Assignee: Apache Spark
Priority: Minor

 SPARK-7184 made codegen enabled by default so let's modify the corresponding 
 documents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9535) Modify document for codegen

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650638#comment-14650638
 ] 

Apache Spark commented on SPARK-9535:
-

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/7863

 Modify document for codegen
 ---

 Key: SPARK-9535
 URL: https://issues.apache.org/jira/browse/SPARK-9535
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, SQL
Affects Versions: 1.5.0
Reporter: Kousuke Saruta
Priority: Minor

 SPARK-7184 made codegen enabled by default so let's modify the corresponding 
 documents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9535) Modify document for codegen

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9535:
---

Assignee: (was: Apache Spark)

 Modify document for codegen
 ---

 Key: SPARK-9535
 URL: https://issues.apache.org/jira/browse/SPARK-9535
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, SQL
Affects Versions: 1.5.0
Reporter: Kousuke Saruta
Priority: Minor

 SPARK-7184 made codegen enabled by default so let's modify the corresponding 
 documents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9537) DecisionTreeClassifierModel support probability prediction for PySpark.ml

2015-08-02 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-9537:
---
Priority: Minor  (was: Major)

 DecisionTreeClassifierModel support probability prediction for PySpark.ml
 -

 Key: SPARK-9537
 URL: https://issues.apache.org/jira/browse/SPARK-9537
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Priority: Minor

 DecisionTreeClassifierModel support probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9537) DecisionTreeClassifierModel support probability prediction for PySpark.ml

2015-08-02 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-9537:
--

 Summary: DecisionTreeClassifierModel support probability 
prediction for PySpark.ml
 Key: SPARK-9537
 URL: https://issues.apache.org/jira/browse/SPARK-9537
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang


DecisionTreeClassifierModel support probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9536) NaiveBayesModel support probability prediction for PySpark.ml

2015-08-02 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-9536:
---
Priority: Minor  (was: Major)

 NaiveBayesModel support probability prediction for PySpark.ml
 -

 Key: SPARK-9536
 URL: https://issues.apache.org/jira/browse/SPARK-9536
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Priority: Minor

 NaiveBayesModel support probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9000) Support generic item type in PrefixSpan

2015-08-02 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-9000:
-
Assignee: Feynman Liang

 Support generic item type in PrefixSpan
 ---

 Key: SPARK-9000
 URL: https://issues.apache.org/jira/browse/SPARK-9000
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Feynman Liang
Priority: Critical
 Fix For: 1.5.0


 In SPARK-6487, we only support Int type. It requires users to encode other 
 types into integer to use PrefixSpan. We should be able to do this inside 
 PrefixSpan, similar to FPGrowth. This should be done before 1.5 since it 
 changes APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9000) Support generic item type in PrefixSpan

2015-08-02 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-9000.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7837
[https://github.com/apache/spark/pull/7837]

 Support generic item type in PrefixSpan
 ---

 Key: SPARK-9000
 URL: https://issues.apache.org/jira/browse/SPARK-9000
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Priority: Critical
 Fix For: 1.5.0


 In SPARK-6487, we only support Int type. It requires users to encode other 
 types into integer to use PrefixSpan. We should be able to do this inside 
 PrefixSpan, similar to FPGrowth. This should be done before 1.5 since it 
 changes APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9370) Support DecimalType in UnsafeRow

2015-08-02 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-9370.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

 Support DecimalType in UnsafeRow
 

 Key: SPARK-9370
 URL: https://issues.apache.org/jira/browse/SPARK-9370
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Davies Liu
 Fix For: 1.5.0


 We should be able to represent the Decimal data using 2 longs (16 byte) given 
 we no longer support unlimited precision.
 Once we figure out how to convert Decimal into 2 longs, we can add support 
 for it similar to the way we add support for IntervalType (SPARK-9369).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9370) Support DecimalType in UnsafeRow

2015-08-02 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650621#comment-14650621
 ] 

Davies Liu commented on SPARK-9370:
---

This is fixed by https://github.com/apache/spark/pull/7758

 Support DecimalType in UnsafeRow
 

 Key: SPARK-9370
 URL: https://issues.apache.org/jira/browse/SPARK-9370
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Davies Liu
 Fix For: 1.5.0


 We should be able to represent the Decimal data using 2 longs (16 byte) given 
 we no longer support unlimited precision.
 Once we figure out how to convert Decimal into 2 longs, we can add support 
 for it similar to the way we add support for IntervalType (SPARK-9369).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7497) test_count_by_value_and_window is flaky

2015-08-02 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-7497:
--
Assignee: (was: Davies Liu)

 test_count_by_value_and_window is flaky
 ---

 Key: SPARK-7497
 URL: https://issues.apache.org/jira/browse/SPARK-7497
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Streaming
Affects Versions: 1.4.0
Reporter: Xiangrui Meng
Priority: Critical
  Labels: flaky-test

 Saw this test failure in 
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32268/console
 {code}
 ==
 FAIL: test_count_by_value_and_window (__main__.WindowFunctionTests)
 --
 Traceback (most recent call last):
   File pyspark/streaming/tests.py, line 418, in 
 test_count_by_value_and_window
 self._test_func(input, func, expected)
   File pyspark/streaming/tests.py, line 133, in _test_func
 self.assertEqual(expected, result)
 AssertionError: Lists differ: [[1], [2], [3], [4], [5], [6], [6], [6], [6], 
 [6]] != [[1], [2], [3], [4], [5], [6], [6], [6]]
 First list contains 2 additional elements.
 First extra element 8:
 [6]
 - [[1], [2], [3], [4], [5], [6], [6], [6], [6], [6]]
 ? --
 + [[1], [2], [3], [4], [5], [6], [6], [6]]
 --
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9441) NoSuchMethodError: Com.typesafe.config.Config.getDuration

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-9441.
--
Resolution: Not A Problem

 NoSuchMethodError: Com.typesafe.config.Config.getDuration
 -

 Key: SPARK-9441
 URL: https://issues.apache.org/jira/browse/SPARK-9441
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.3.1
Reporter: nirav patel

 I recently migrated my spark based rest service from 1.0.2 to 1.3.1 
 15/07/29 10:31:12 INFO spark.SparkContext: Running Spark version 1.3.1
 15/07/29 10:31:12 INFO spark.SecurityManager: Changing view acls to: npatel
 15/07/29 10:31:12 INFO spark.SecurityManager: Changing modify acls to: npatel
 15/07/29 10:31:12 INFO spark.SecurityManager: SecurityManager: authentication 
 disabled; ui acls disabled; users with view permissions: Set(npatel); users 
 with modify permissions: Set(npatel)
 Exception in thread main java.lang.NoSuchMethodError: 
 com.typesafe.config.Config.getDuration(Ljava/lang/String;Ljava/util/concurrent/TimeUnit;)J
 at 
 akka.util.Helpers$ConfigOps$.akka$util$Helpers$ConfigOps$$getDuration$extension(Helpers.scala:125)
 at akka.util.Helpers$ConfigOps$.getMillisDuration$extension(Helpers.scala:120)
 at akka.actor.ActorSystem$Settings.init(ActorSystem.scala:171)
 at akka.actor.ActorSystemImpl.init(ActorSystem.scala:504)
 at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
 at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
 at 
 org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122)
 at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:55)
 at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
 at 
 org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1837)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
 at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1828)
 at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:57)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:223)
 at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163)
 at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:269)
 at org.apache.spark.SparkContext.init(SparkContext.scala:272)
 I read on blogs where people suggest to modify classpath and put right 
 version before, put scala libs before in classpath and similar suggestions. 
 which is  all ridiculous. I think typesafe config package included with 
 spark-core lib is incorrect. I did  following with my maven build and now it 
 works. But i think someone need to fix spark-core package. 
 dependency
   groupIdorg.apache.spark/groupId
   artifactIdspark-core_2.10/artifactId
   exclusions
   exclusion
   artifactIdconfig/artifactId
   groupIdcom.typesafe/groupId
   /exclusion
   /exclusions
   /dependency
   dependency
   groupIdcom.typesafe/groupId
   artifactIdconfig/artifactId
   version1.2.1/version
   /dependency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8889) showDagViz will cause java.lang.OutOfMemoryError: Java heap space

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8889:
-
Target Version/s:   (was: 1.4.2, 1.5.0)
Priority: Minor  (was: Major)
   Fix Version/s: (was: 1.4.2)

 showDagViz will cause java.lang.OutOfMemoryError: Java heap space
 -

 Key: SPARK-8889
 URL: https://issues.apache.org/jira/browse/SPARK-8889
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.4.0
 Environment: Spark 1.4.0
 Hadoop 2.2.0
Reporter: cen yuhai
Priority: Minor

 HTTP ERROR 500
 Problem accessing /history/app-20150708101140-0018/jobs/job/. Reason:
 Server Error
 Caused by:
 java.lang.OutOfMemoryError: Java heap space
   at java.util.Arrays.copyOf(Arrays.java:2367)
   at 
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
   at 
 java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
   at 
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
   at java.lang.StringBuilder.append(StringBuilder.java:132)
   at 
 scala.collection.mutable.StringBuilder.append(StringBuilder.scala:207)
   at 
 org.apache.spark.ui.scope.RDDOperationGraph$$anonfun$org$apache$spark$ui$scope$RDDOperationGraph$$makeDotSubgraph$2.apply(RDDOperationGraph.scala:192)
   at 
 org.apache.spark.ui.scope.RDDOperationGraph$$anonfun$org$apache$spark$ui$scope$RDDOperationGraph$$makeDotSubgraph$2.apply(RDDOperationGraph.scala:191)
   at scala.collection.immutable.Stream.foreach(Stream.scala:547)
   at 
 org.apache.spark.ui.scope.RDDOperationGraph$.org$apache$spark$ui$scope$RDDOperationGraph$$makeDotSubgraph(RDDOperationGraph.scala:191)
   at 
 org.apache.spark.ui.scope.RDDOperationGraph$.makeDotFile(RDDOperationGraph.scala:170)
   at 
 org.apache.spark.ui.UIUtils$$anonfun$showDagViz$1.apply(UIUtils.scala:361)
   at 
 org.apache.spark.ui.UIUtils$$anonfun$showDagViz$1.apply(UIUtils.scala:357)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at scala.collection.immutable.List.foreach(List.scala:318)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at org.apache.spark.ui.UIUtils$.showDagViz(UIUtils.scala:357)
   at org.apache.spark.ui.UIUtils$.showDagVizForJob(UIUtils.scala:335)
   at org.apache.spark.ui.jobs.JobPage.render(JobPage.scala:317)
   at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:79)
   at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:79)
   at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:69)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
   at 
 org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
   at 
 org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
   at 
 org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
   at 
 org.spark-project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
   at 
 org.spark-project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
   at 
 org.spark-project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-9099) spark-ec2 does not add important ports to security group

2015-08-02 Thread Brian Sung-jin Hong (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Sung-jin Hong closed SPARK-9099.
--
Resolution: Invalid

 spark-ec2 does not add important ports to security group
 

 Key: SPARK-9099
 URL: https://issues.apache.org/jira/browse/SPARK-9099
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.4.0, 1.4.1
Reporter: Brian Sung-jin Hong
Priority: Minor

 spark-ec2 scripts misses to add some few important ports to the security 
 group, including:
 Master 6066: Needed to submit jobs outside of the cluster
 Slave 4040: Needed to view worker state
 Slave 8082: Needed to view some worker logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9535) Modify document for codegen

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9535:
-
Assignee: KaiXinXIaoLei

 Modify document for codegen
 ---

 Key: SPARK-9535
 URL: https://issues.apache.org/jira/browse/SPARK-9535
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, SQL
Affects Versions: 1.5.0
Reporter: Kousuke Saruta
Assignee: KaiXinXIaoLei
Priority: Minor

 SPARK-7184 made codegen enabled by default so let's modify the corresponding 
 documents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9536) NaiveBayesModel support probability prediction for PySpark.ml

2015-08-02 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-9536:
--

 Summary: NaiveBayesModel support probability prediction for 
PySpark.ml
 Key: SPARK-9536
 URL: https://issues.apache.org/jira/browse/SPARK-9536
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang


NaiveBayesModel support probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9534) Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition

2015-08-02 Thread Sean Owen (JIRA)
Sean Owen created SPARK-9534:


 Summary: Enable javac lint for scalac parity; fix a lot of build 
warnings, 1.5.0 edition
 Key: SPARK-9534
 URL: https://issues.apache.org/jira/browse/SPARK-9534
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Minor


For parity with the kinds of warnings scalac emits, we should turn on some of 
javac's lint options. This reports, for example use of deprecated APIs and 
unchecked casts as scalac does.

And it's a good time to sweep through build warnings and fix a bunch before the 
release.

PR coming which shows and explains the fixes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9149) Add an example of spark.ml KMeans

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-9149.
--
Resolution: Fixed

Issue resolved by pull request 7697
[https://github.com/apache/spark/pull/7697]

 Add an example of spark.ml KMeans
 -

 Key: SPARK-9149
 URL: https://issues.apache.org/jira/browse/SPARK-9149
 Project: Spark
  Issue Type: Documentation
  Components: Examples, ML
Reporter: Yu Ishikawa
Assignee: Yu Ishikawa
 Fix For: 1.5.0


 Create an example of KMeans API for spark.ml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9149) Add an example of spark.ml KMeans

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9149:
-
Priority: Minor  (was: Major)

 Add an example of spark.ml KMeans
 -

 Key: SPARK-9149
 URL: https://issues.apache.org/jira/browse/SPARK-9149
 Project: Spark
  Issue Type: Documentation
  Components: Examples, ML
Reporter: Yu Ishikawa
Assignee: Yu Ishikawa
Priority: Minor
 Fix For: 1.5.0


 Create an example of KMeans API for spark.ml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4454) Race condition in DAGScheduler

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-4454.
--
Resolution: Fixed

Given the unlikelihood of a further 1.2.x release, I'm closing this as no 
longer needing a back port

 Race condition in DAGScheduler
 --

 Key: SPARK-4454
 URL: https://issues.apache.org/jira/browse/SPARK-4454
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.1.0
Reporter: Rafal Kwasny
Assignee: Josh Rosen
Priority: Critical
 Fix For: 1.3.0


 It seems to be a race condition in DAGScheduler that manifests on jobs with 
 high concurrency:
 {noformat}
  Exception in thread main java.util.NoSuchElementException: key not found: 
 35
 at scala.collection.MapLike$class.default(MapLike.scala:228)
 at scala.collection.AbstractMap.default(Map.scala:58)
 at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
 at 
 org.apache.spark.scheduler.DAGScheduler.getCacheLocs(DAGScheduler.scala:201)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1292)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304)
 at 
 org.apache.spark.scheduler.DAGScheduler.getPreferredLocs(DAGScheduler.scala:1275)
 at 
 org.apache.spark.SparkContext.getPreferredLocs(SparkContext.scala:937)
 at 

[jira] [Updated] (SPARK-4454) Race condition in DAGScheduler

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4454:
-
Labels:   (was: backport-needed)

 Race condition in DAGScheduler
 --

 Key: SPARK-4454
 URL: https://issues.apache.org/jira/browse/SPARK-4454
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.1.0
Reporter: Rafal Kwasny
Assignee: Josh Rosen
Priority: Critical
 Fix For: 1.3.0


 It seems to be a race condition in DAGScheduler that manifests on jobs with 
 high concurrency:
 {noformat}
  Exception in thread main java.util.NoSuchElementException: key not found: 
 35
 at scala.collection.MapLike$class.default(MapLike.scala:228)
 at scala.collection.AbstractMap.default(Map.scala:58)
 at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
 at 
 org.apache.spark.scheduler.DAGScheduler.getCacheLocs(DAGScheduler.scala:201)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1292)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304)
 at 
 org.apache.spark.scheduler.DAGScheduler.getPreferredLocs(DAGScheduler.scala:1275)
 at 
 org.apache.spark.SparkContext.getPreferredLocs(SparkContext.scala:937)
 at 
 org.apache.spark.rdd.PartitionCoalescer.currPrefLocs(CoalescedRDD.scala:175)
 

[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources

2015-08-02 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650657#comment-14650657
 ] 

Sean Owen commented on SPARK-8119:
--

I attempted a back-port but this depends on SPARK-7835 and possibly other prior 
changes, which I'm not so familiar with.

 HeartbeatReceiver should not adjust application executor resources
 --

 Key: SPARK-8119
 URL: https://issues.apache.org/jira/browse/SPARK-8119
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: SaintBacchus
Assignee: Andrew Or
Priority: Critical
  Labels: backport-needed
 Fix For: 1.5.0


 DynamicAllocation will set the total executor to a little number when it 
 wants to kill some executors.
 But in no-DynamicAllocation scenario, Spark will also set the total executor.
 So it will cause such problem: sometimes an executor fails down, there is no 
 more executor which will be pull up by spark.
 === EDIT by andrewor14 ===
 The issue is that the AM forgets about the original number of executors it 
 wants after calling sc.killExecutor. Even if dynamic allocation is not 
 enabled, this is still possible because of heartbeat timeouts.
 I think the problem is that sc.killExecutor is used incorrectly in 
 HeartbeatReceiver. The intention of the method is to permanently adjust the 
 number of executors the application will get. In HeartbeatReceiver, however, 
 this is used as a best-effort mechanism to ensure that the timed out executor 
 is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD

2015-08-02 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng reassigned SPARK-9527:


Assignee: Xiangrui Meng

 PrefixSpan.run should return a PrefixSpanModel instead of an RDD
 

 Key: SPARK-9527
 URL: https://issues.apache.org/jira/browse/SPARK-9527
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Critical

 With a model wrapping the result RDD, it would be more flexible to add 
 features in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8874) Add missing methods in Word2Vec ML

2015-08-02 Thread Manoj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650619#comment-14650619
 ] 

Manoj Kumar commented on SPARK-8874:


Done. Thanks.

 Add missing methods in Word2Vec ML
 --

 Key: SPARK-8874
 URL: https://issues.apache.org/jira/browse/SPARK-8874
 Project: Spark
  Issue Type: New Feature
  Components: ML, PySpark
Reporter: Manoj Kumar
Assignee: Manoj Kumar

 Add getVectors and findSynonyms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9529) Improve sort on Decimal

2015-08-02 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-9529.

   Resolution: Fixed
Fix Version/s: 1.5.0

 Improve sort on Decimal
 ---

 Key: SPARK-9529
 URL: https://issues.apache.org/jira/browse/SPARK-9529
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Davies Liu
Assignee: Davies Liu
Priority: Critical
 Fix For: 1.5.0


 Right now, it's really slow, just hang there in random tests 
 {code}
 pool-1-thread-1-ScalaTest-running-TungstenSortSuite prio=5 
 tid=0x7f822bc82800 nid=0x5103 runnable [0x00011d1be000]
java.lang.Thread.State: RUNNABLE
   at java.math.BigInteger.init(BigInteger.java:405)
   at java.math.BigDecimal.bigTenToThe(BigDecimal.java:3380)
   at java.math.BigDecimal.bigMultiplyPowerTen(BigDecimal.java:3508)
   at java.math.BigDecimal.setScale(BigDecimal.java:2394)
   at java.math.BigDecimal.divide(BigDecimal.java:1691)
   at java.math.BigDecimal.divideToIntegralValue(BigDecimal.java:1734)
   at java.math.BigDecimal.divideAndRemainder(BigDecimal.java:1891)
   at java.math.BigDecimal.remainder(BigDecimal.java:1833)
   at scala.math.BigDecimal.remainder(BigDecimal.scala:281)
   at scala.math.BigDecimal.isWhole(BigDecimal.scala:215)
   at scala.math.BigDecimal.hashCode(BigDecimal.scala:180)
   at org.apache.spark.sql.types.Decimal.hashCode(Decimal.scala:260)
   at 
 org.apache.spark.sql.catalyst.InternalRow.hashCode(InternalRow.scala:121)
   at org.apache.spark.RangePartitioner.hashCode(Partitioner.scala:201)
   at java.lang.Object.toString(Object.java:237)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1418)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
   at 
 org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84)
   at 
 org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
   at 
 org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
   at org.apache.spark.SparkContext.clean(SparkContext.scala:2003)
   at 
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:683)
   at 
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:682)
   at 
 org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
   at 
 org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
   at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
   at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:682)
   at 
 org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:181)
   at 
 org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:148)
   at 
 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
   at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:148)
   at 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
   at 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
   at 
 org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112)
   at 
 org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48)
   at 
 org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48)
   at 
 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
   at org.apache.spark.sql.execution.Sort.doExecute(sort.scala:47)
   at 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
   at 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113)
   at 
 org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112)
   at 
 

[jira] [Resolved] (SPARK-8612) Yarn application status is misreported for failed PySpark apps.

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-8612.
--
Resolution: Duplicate

I believe so. I think Marcelo is following up on this general issue; there are 
a few tickets.

 Yarn application status is misreported for failed PySpark apps.
 ---

 Key: SPARK-8612
 URL: https://issues.apache.org/jira/browse/SPARK-8612
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.3.0, 1.3.1, 1.4.0
 Environment: PySpark job run in yarn-client mode on CDH 5.4.2
Reporter: Juliet Hougland
Priority: Minor

 When a PySpark job fails, the YARN records and reports its status as 
 successful. Hari Shreedharan pointed out to me that [the ApplicationMaster 
 records app success when system.exit is called. | 
 https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L124]
  PySpark always [exits by calling os._exit. | 
 https://github.com/apache/spark/blob/master/python/pyspark/daemon.py#L169] 
 Because of this, every PySpark application run on yarn is marked as completed 
 successfully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9521) Require Maven 3.3.3+ in the build

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-9521.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7852
[https://github.com/apache/spark/pull/7852]

 Require Maven 3.3.3+ in the build
 -

 Key: SPARK-9521
 URL: https://issues.apache.org/jira/browse/SPARK-9521
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 1.4.1
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Trivial
 Fix For: 1.5.0


 Patrick recently discovered a build problem that manifested because he was 
 using the Maven 3.2.x installed on his system, and which was resolved by 
 using Maven 3.3.x. Since we have a script that can install Maven 3.3.3 for 
 anyone, it probably makes sense to just enforce use of Maven 3.3.3+ in the 
 build. (Currently it's just 3.0.4+).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9535) Modify document for codegen

2015-08-02 Thread Kousuke Saruta (JIRA)
Kousuke Saruta created SPARK-9535:
-

 Summary: Modify document for codegen
 Key: SPARK-9535
 URL: https://issues.apache.org/jira/browse/SPARK-9535
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, SQL
Affects Versions: 1.5.0
Reporter: Kousuke Saruta
Priority: Minor


SPARK-7184 made codegen enabled by default so let's modify the corresponding 
documents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9523) Receiver for Spark Streaming does not naturally support kryo serializer

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9523:
-
Target Version/s:   (was: 1.3.1)
Priority: Minor  (was: Major)
   Fix Version/s: (was: 1.4.2)
  (was: 1.3.2)

[~fish748] Please read 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark  The 
fields on this JIRA can't be right... 1.3.1 was released. Fix version doesn't 
apply to unresolved JIRAs. etc.

 Receiver for Spark Streaming does not naturally support kryo serializer
 ---

 Key: SPARK-9523
 URL: https://issues.apache.org/jira/browse/SPARK-9523
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.3.1
 Environment: Windows 7 local mode
Reporter: John Chen
Priority: Minor
  Labels: kryo, serialization
   Original Estimate: 120h
  Remaining Estimate: 120h

 In some cases, some attributes in a class is not serializable, which you 
 still want to use after serialization of the whole object, you'll have to 
 customize your serialization codes. For example, you can declare those 
 attributes as transient, which makes them ignored during serialization, and 
 then you can reassign their values during deserialization.
 Now, if you're using Java serialization, you'll have to implement 
 Serializable, and write those codes in readObject() and writeObejct() 
 methods; And if you're using kryo serialization, you'll have to implement 
 KryoSerializable, and write these codes in read() and write() methods.
 In Spark and Spark Streaming, you can set kryo as the serializer for speeding 
 up. However, the functions taken by RDD or DStream operations are still 
 serialized by Java serialization, which means you only need to write those 
 custom serialization codes in readObject() and writeObejct() methods.
 But when it comes to Spark Streaming's Receiver, things are different. When 
 you wish to customize an InputDStream, you must extend the Receiver. However, 
 it turns out, the Receiver will be serialized by kryo if you set kryo 
 serializer in SparkConf, and will fall back to Java serialization if you 
 didn't.
 So here's comes the problems, if you want to change the serializer by 
 configuration and make sure the Receiver runs perfectly for both Java and 
 kryo, you'll have to write all the 4 methods above. First, it is redundant, 
 since you'll have to write serialization/deserialization code almost twice; 
 Secondly, there's nothing in the doc or in the code to inform users to 
 implement the KryoSerializable interface. 
 Since all other function parameters are serialized by Java only, I suggest 
 you also make it so for the Receiver. It may be slower, but since the 
 serialization will only be executed for each interval, it's durable. More 
 importantly, it can cause fewer trouble



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9533) Add missing methods in Word2Vec ML (Python API)

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9533:
-
   Priority: Minor  (was: Major)
Component/s: ML

 Add missing methods in Word2Vec ML (Python API)
 ---

 Key: SPARK-9533
 URL: https://issues.apache.org/jira/browse/SPARK-9533
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Manoj Kumar
Priority: Minor

 After 8874 is resolved, we can add python wrappers for the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9504) Flaky test: o.a.s.streaming.StreamingContextSuite.stop gracefully

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9504:
-
Assignee: Shixiong Zhu

 Flaky test: o.a.s.streaming.StreamingContextSuite.stop gracefully
 -

 Key: SPARK-9504
 URL: https://issues.apache.org/jira/browse/SPARK-9504
 Project: Spark
  Issue Type: Sub-task
  Components: Streaming
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu
  Labels: flaky-test
 Fix For: 1.5.0


 Failure build: 
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39149/ 
 {code}
 [info] - stop gracefully *** FAILED *** (3 seconds, 522 milliseconds)
 [info]   0 was not greater than 0 (StreamingContextSuite.scala:277)
 [info]   org.scalatest.exceptions.TestFailedException:
 [info]   at 
 org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
 [info]   at 
 org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
 [info]   at 
 org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
 [info]   at 
 org.apache.spark.streaming.StreamingContextSuite$$anonfun$21$$anonfun$apply$mcV$sp$3.apply$mcVI$sp(StreamingContextSuite.scala:277)
 [info]   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
 [info]   at 
 org.apache.spark.streaming.StreamingContextSuite$$anonfun$21.apply$mcV$sp(StreamingContextSuite.scala:261)
 [info]   at 
 org.apache.spark.streaming.StreamingContextSuite$$anonfun$21.apply(StreamingContextSuite.scala:257)
 [info]   at 
 org.apache.spark.streaming.StreamingContextSuite$$anonfun$21.apply(StreamingContextSuite.scala:257)
 [info]   at 
 org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
 [info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
 [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
 [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
 [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
 [info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
 [info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
 [info]   at 
 org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
 [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
 [info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
 [info]   at 
 org.apache.spark.streaming.StreamingContextSuite.org$scalatest$BeforeAndAfter$$super$runTest(StreamingContextSuite.scala:42)
 [info]   at 
 org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
 [info]   at 
 org.apache.spark.streaming.StreamingContextSuite.runTest(StreamingContextSuite.scala:42)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
 [info]   at 
 org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
 [info]   at 
 org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
 [info]   at scala.collection.immutable.List.foreach(List.scala:318)
 [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
 [info]   at 
 org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
 [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
 [info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
 [info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
 [info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
 [info]   at 
 org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
 [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
 [info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
 [info]   at 
 org.apache.spark.streaming.StreamingContextSuite.org$scalatest$BeforeAndAfter$$super$run(StreamingContextSuite.scala:42)
 [info]   at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
 [info]   at 
 org.apache.spark.streaming.StreamingContextSuite.run(StreamingContextSuite.scala:42)
 [info]   at 
 org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
 [info]   at 
 org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
 [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
 [info]   at 

[jira] [Commented] (SPARK-9375) The total number of executor(s) requested by the driver may be negative

2015-08-02 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650650#comment-14650650
 ] 

Sean Owen commented on SPARK-9375:
--

[~sandyr] has a question for you on the PR; this may have indeed been resolved 
by other changes. Can you clarify what version you are running? and explain why 
you think it's different?

 The total number of  executor(s) requested by  the driver may be negative
 -

 Key: SPARK-9375
 URL: https://issues.apache.org/jira/browse/SPARK-9375
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.1
Reporter: KaiXinXIaoLei
 Attachments: The total number of executor(s) is negative in AM log.png


 I set spark.dynamicAllocation.enabled = true”. I run a big job. I find a 
 problem in ApplicationMaster log: the total number of  executor(s) requested 
 by  the driver is negative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8981) Set applicationId and appName in log4j MDC

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-8981.
--
Resolution: Won't Fix

We can reopen this if there is a PR to clarify how this would work

 Set applicationId and appName in log4j MDC
 --

 Key: SPARK-8981
 URL: https://issues.apache.org/jira/browse/SPARK-8981
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Reporter: Paweł Kopiczko
Priority: Minor

 It would be nice to have, because it's good to have logs in one file when 
 using log agents (like logentires) in standalone mode. Also allows 
 configuring rolling file appender without a mess when multiple applications 
 are running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650652#comment-14650652
 ] 

Apache Spark commented on SPARK-7563:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/7865

 OutputCommitCoordinator.stop() should only be executed in driver
 

 Key: SPARK-7563
 URL: https://issues.apache.org/jira/browse/SPARK-7563
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo)
 Spark 1.3.1 Release
Reporter: Hailong Wen
Priority: Critical
  Labels: backport-needed
 Fix For: 1.4.0


 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with 
 EGO (a resource management product).
 In EGO we uses fine-grained dynamic allocation policy, and each Executor will 
 exit after its tasks are all done. When testing *spark-shell*, we find that 
 when executor of first job exit, it will stop OutputCommitCoordinator, which 
 result in all future jobs failing. Details are as follows:
 We got the following error in executor when submitting job in *spark-shell* 
 the second time (the first job submission is successful):
 {noformat}
 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to 
 OutputCommitCoordinator: 
 akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator
 Exception in thread main akka.actor.ActorNotFound: Actor not found for: 
 ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), 
 Path(/user/OutputCommitCoordinator)]
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at 
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at 
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
 at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 {noformat}
 And in driver side, we see a log message telling that the 
 OutputCommitCoordinator is stopped after the first submission:
 {noformat}
 15/05/11 04:01:23 INFO 
 spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: 
 OutputCommitCoordinator stopped!
 {noformat}
 We examine the code of OutputCommitCoordinator, and find that executor will 
 reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor 
 exits, it will eventually call SparkEnv.stop():
 {noformat}
   private[spark] def stop() {
 isStopped = true
 pythonWorkers.foreach { case(key, worker) = worker.stop() }
 Option(httpFileServer).foreach(_.stop())
 mapOutputTracker.stop()
 shuffleManager.stop()
 broadcastManager.stop()
  

[jira] [Created] (SPARK-9533) Add missing methods in Word2Vec ML (Python API)

2015-08-02 Thread Manoj Kumar (JIRA)
Manoj Kumar created SPARK-9533:
--

 Summary: Add missing methods in Word2Vec ML (Python API)
 Key: SPARK-9533
 URL: https://issues.apache.org/jira/browse/SPARK-9533
 Project: Spark
  Issue Type: Improvement
Reporter: Manoj Kumar


After 8874 is resolved, we can add python wrappers for the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9534) Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9534:
---

Assignee: Sean Owen  (was: Apache Spark)

 Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 
 edition
 ---

 Key: SPARK-9534
 URL: https://issues.apache.org/jira/browse/SPARK-9534
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Minor

 For parity with the kinds of warnings scalac emits, we should turn on some of 
 javac's lint options. This reports, for example use of deprecated APIs and 
 unchecked casts as scalac does.
 And it's a good time to sweep through build warnings and fix a bunch before 
 the release.
 PR coming which shows and explains the fixes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9534) Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650631#comment-14650631
 ] 

Apache Spark commented on SPARK-9534:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/7862

 Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 
 edition
 ---

 Key: SPARK-9534
 URL: https://issues.apache.org/jira/browse/SPARK-9534
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Minor

 For parity with the kinds of warnings scalac emits, we should turn on some of 
 javac's lint options. This reports, for example use of deprecated APIs and 
 unchecked casts as scalac does.
 And it's a good time to sweep through build warnings and fix a bunch before 
 the release.
 PR coming which shows and explains the fixes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9534) Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9534:
---

Assignee: Apache Spark  (was: Sean Owen)

 Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 
 edition
 ---

 Key: SPARK-9534
 URL: https://issues.apache.org/jira/browse/SPARK-9534
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Sean Owen
Assignee: Apache Spark
Priority: Minor

 For parity with the kinds of warnings scalac emits, we should turn on some of 
 javac's lint options. This reports, for example use of deprecated APIs and 
 unchecked casts as scalac does.
 And it's a good time to sweep through build warnings and fix a bunch before 
 the release.
 PR coming which shows and explains the fixes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9209) Using executor allocation, a executor is removed but it exists in ExecutorsPage of the web ui

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9209:
-
Target Version/s:   (was: 1.5.0)
Priority: Minor  (was: Major)
   Fix Version/s: (was: 1.5.0)

 Using executor allocation, a executor is removed but it exists in 
 ExecutorsPage of the web ui 
 --

 Key: SPARK-9209
 URL: https://issues.apache.org/jira/browse/SPARK-9209
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.4.1
Reporter: KaiXinXIaoLei
Priority: Minor
 Attachments: A Executor exists in web.png, executor is removed.png


 I set spark.dynamicAllocation.enabled = true”, and  run a big job. In 
 driver, a executor is asked to remove, and it's remove successfully, and the 
 process of this executor is not exist. But it exists in ExecutorsPage of the 
 web ui.
 The log in driver :
 2015-07-17 11:48:14,543 | INFO  | 
 [sparkDriver-akka.actor.default-dispatcher-3] | Removing block manager 
 BlockManagerId(264, 172.1.1.8, 23811) 
 2015-07-17 11:48:14,543 | INFO  | [dag-scheduler-event-loop] | Removed 264 
 successfully in removeExecutor 
 2015-07-17 11:48:21,226 | INFO  | 
 [sparkDriver-akka.actor.default-dispatcher-3] | Registering block manager 
 172.1.1.8:23811 with 10.4 GB RAM, BlockManagerId(264, 172.1.1.8, 23811) 
 2015-07-17 11:48:21,228 | INFO  | 
 [sparkDriver-akka.actor.default-dispatcher-3] | Added broadcast_781_piece0 in 
 memory on 172.1.1.8:23811 (size: 38.6 KB, free: 10.4 GB)  
 2015-07-17 11:48:35,277 | ERROR | 
 [sparkDriver-akka.actor.default-dispatcher-16] | Lost executor 264 on 
 datasight-195: remote Rpc client disassociated 
 2015-07-17 11:48:35,277 | WARN  | 
 [sparkDriver-akka.actor.default-dispatcher-4] | Association with remote 
 system [akka.tcp://sparkExecutor@datasight-195:23929] has failed, address is 
 now gated for [5000] ms. Reason is: [Disassociated].
 2015-07-17 11:48:35,277 | INFO  | 
 [sparkDriver-akka.actor.default-dispatcher-16] | Re-queueing tasks for 264 
 from TaskSet 415.0 
 2015-07-17 11:48:35,804 | INFO  | [SparkListenerBus] | Existing executor 264 
 has been removed (new total is 10)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9056) Rename configuration `spark.streaming.minRememberDuration` to `spark.streaming.fileStream.minRememberDuration`

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9056:
-
Assignee: Sameer Abhyankar

 Rename configuration `spark.streaming.minRememberDuration` to 
 `spark.streaming.fileStream.minRememberDuration`
 --

 Key: SPARK-9056
 URL: https://issues.apache.org/jira/browse/SPARK-9056
 Project: Spark
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 1.4.1
Reporter: Tathagata Das
Assignee: Sameer Abhyankar
Priority: Trivial
  Labels: starter
 Fix For: 1.5.0


 spark.streaming.minRememberDuration is confusing as it is not immediately 
 evident what this configuration is about. Best to rename it to 
 spark.streaming.fileStream.minRememberDuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9538) LogisticRegression support raw and probability prediction for PySpark.ml

2015-08-02 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-9538:
--

 Summary: LogisticRegression support raw and probability prediction 
for PySpark.ml
 Key: SPARK-9538
 URL: https://issues.apache.org/jira/browse/SPARK-9538
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Priority: Minor


LogisticRegression support raw and probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9536) NaiveBayesModel support probability prediction for PySpark.ml

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650694#comment-14650694
 ] 

Apache Spark commented on SPARK-9536:
-

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/7866

 NaiveBayesModel support probability prediction for PySpark.ml
 -

 Key: SPARK-9536
 URL: https://issues.apache.org/jira/browse/SPARK-9536
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Priority: Minor

 NaiveBayesModel support probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8064) Upgrade Hive to 1.2

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650725#comment-14650725
 ] 

Apache Spark commented on SPARK-8064:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/7867

 Upgrade Hive to 1.2
 ---

 Key: SPARK-8064
 URL: https://issues.apache.org/jira/browse/SPARK-8064
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Steve Loughran
Priority: Blocker





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7230) Make RDD API private in SparkR for Spark 1.4

2015-08-02 Thread Michael Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651023#comment-14651023
 ] 

Michael Smith commented on SPARK-7230:
--

I support Antonio's request to bring back this functionality in version 1.5 so 
that plyrmr can continue to be used with the Spark backend as before. 

 Make RDD API private in SparkR for Spark 1.4
 

 Key: SPARK-7230
 URL: https://issues.apache.org/jira/browse/SPARK-7230
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Shivaram Venkataraman
Assignee: Shivaram Venkataraman
Priority: Critical
 Fix For: 1.4.0


 This ticket proposes making the RDD API in SparkR private for the 1.4 
 release. The motivation for doing so are discussed in a larger design 
 document aimed at a more top-down design of the SparkR APIs. A first cut that 
 discusses motivation and proposed changes can be found at http://goo.gl/GLHKZI
 The main points in that document that relate to this ticket are:
 - The RDD API requires knowledge of the distributed system and is pretty low 
 level. This is not very suitable for a number of R users who are used to more 
 high-level packages that work out of the box.
 - The RDD implementation in SparkR is not fully robust right now: we are 
 missing features like spilling for aggregation, handling partitions which 
 don't fit in memory etc. There are further limitations like lack of hashCode 
 for non-native types etc. which might affect user experience.
 The only change we will make for now is to not export the RDD functions as 
 public methods in the SparkR package and I will create another ticket for 
 discussing more details public API for 1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9254) sbt-launch-lib.bash should use `curl --location` to support HTTP/HTTPS redirection

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9254:
-
Fix Version/s: 1.3.2

 sbt-launch-lib.bash should use `curl --location` to support HTTP/HTTPS 
 redirection
 --

 Key: SPARK-9254
 URL: https://issues.apache.org/jira/browse/SPARK-9254
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.5.0
Reporter: Cheng Lian
Assignee: Cheng Lian
 Fix For: 1.3.2, 1.4.2, 1.5.0


 The {{curl}} call in the script should use {{--location}} to support 
 HTTP/HTTPS redirection, since target file(s) can be hosted on CDN nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9538) LogisticRegression support raw and probability prediction for PySpark.ml

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650696#comment-14650696
 ] 

Apache Spark commented on SPARK-9538:
-

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/7866

 LogisticRegression support raw and probability prediction for PySpark.ml
 

 Key: SPARK-9538
 URL: https://issues.apache.org/jira/browse/SPARK-9538
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Priority: Minor

 LogisticRegression support raw and probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9537) DecisionTreeClassifierModel support probability prediction for PySpark.ml

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9537:
---

Assignee: (was: Apache Spark)

 DecisionTreeClassifierModel support probability prediction for PySpark.ml
 -

 Key: SPARK-9537
 URL: https://issues.apache.org/jira/browse/SPARK-9537
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Priority: Minor

 DecisionTreeClassifierModel support probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9537) DecisionTreeClassifierModel support probability prediction for PySpark.ml

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650695#comment-14650695
 ] 

Apache Spark commented on SPARK-9537:
-

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/7866

 DecisionTreeClassifierModel support probability prediction for PySpark.ml
 -

 Key: SPARK-9537
 URL: https://issues.apache.org/jira/browse/SPARK-9537
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Priority: Minor

 DecisionTreeClassifierModel support probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9538) LogisticRegression support raw and probability prediction for PySpark.ml

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9538:
---

Assignee: Apache Spark

 LogisticRegression support raw and probability prediction for PySpark.ml
 

 Key: SPARK-9538
 URL: https://issues.apache.org/jira/browse/SPARK-9538
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Assignee: Apache Spark
Priority: Minor

 LogisticRegression support raw and probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9538) LogisticRegression support raw and probability prediction for PySpark.ml

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9538:
---

Assignee: (was: Apache Spark)

 LogisticRegression support raw and probability prediction for PySpark.ml
 

 Key: SPARK-9538
 URL: https://issues.apache.org/jira/browse/SPARK-9538
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Priority: Minor

 LogisticRegression support raw and probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9537) DecisionTreeClassifierModel support probability prediction for PySpark.ml

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9537:
---

Assignee: Apache Spark

 DecisionTreeClassifierModel support probability prediction for PySpark.ml
 -

 Key: SPARK-9537
 URL: https://issues.apache.org/jira/browse/SPARK-9537
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Assignee: Apache Spark
Priority: Minor

 DecisionTreeClassifierModel support probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9536) NaiveBayesModel support probability prediction for PySpark.ml

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9536:
---

Assignee: (was: Apache Spark)

 NaiveBayesModel support probability prediction for PySpark.ml
 -

 Key: SPARK-9536
 URL: https://issues.apache.org/jira/browse/SPARK-9536
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Priority: Minor

 NaiveBayesModel support probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9536) NaiveBayesModel support probability prediction for PySpark.ml

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9536:
---

Assignee: Apache Spark

 NaiveBayesModel support probability prediction for PySpark.ml
 -

 Key: SPARK-9536
 URL: https://issues.apache.org/jira/browse/SPARK-9536
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Assignee: Apache Spark
Priority: Minor

 NaiveBayesModel support probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9527:
---

Assignee: Xiangrui Meng  (was: Apache Spark)

 PrefixSpan.run should return a PrefixSpanModel instead of an RDD
 

 Key: SPARK-9527
 URL: https://issues.apache.org/jira/browse/SPARK-9527
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Critical

 With a model wrapping the result RDD, it would be more flexible to add 
 features in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak

2015-08-02 Thread Andrey Zimovnov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651067#comment-14651067
 ] 

Andrey Zimovnov commented on SPARK-9539:


Hi, Owen! I'm not sure what Permanent in java heap means, but it grows with 
time. I really have such a use case, when I need to recreate spark context a 
lot. The only workaround for now is to try to increase MaxPermSize, I guess.

 Repeated sc.close() in PySpark causes JVM memory leak
 -

 Key: SPARK-9539
 URL: https://issues.apache.org/jira/browse/SPARK-9539
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.4.1
Reporter: Andrey Zimovnov
Priority: Minor
 Attachments: Screenshot at авг. 02 19-10-53.png


 Example code in Python:
 {code:python}
 for i in range(20):
   print i
   conf = SparkConf().setAppName(test)
   sc = SparkContext(conf=conf)
   hivec = HiveContext(sc)
   hivec.sql(select id from details_info limit 1).show()
   sc.stop()
   del hivec
   del sc
 {code}
 Jstat output:
 {noformat}
  S0CS1CS0US1U  EC   EUOC OU   PC 
 PUYGC YGCTFGCFGCT GCT
 196608,0 196608,0 97566,2  0,0   1179648,0 542150,0 3145728,0120,0
 154112,0 153613,2  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 679041,7 3145728,0120,0
 164352,0 164183,3  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 907928,4 3145728,0120,0
 164352,0 164200,3  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 912132,7 3145728,0120,0
 164352,0 164200,5  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 913741,5 3145728,0120,0
 164352,0 164200,8  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 929458,6 3145728,0120,0
 164352,0 164206,0  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 1003138,1 3145728,0120,0
 168960,0 168646,0  40,434   0  0,0000,434
 131584,0 196608,0  0,0   109725,6 1179648,0   0,03145728,0128,0
 175104,0 174802,1  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 152654,9 3145728,0128,0
 175104,0 174803,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 158586,1 3145728,0128,0
 175104,0 174803,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 160659,8 3145728,0128,0
 175104,0 174805,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 181935,2 3145728,0128,0
 175104,0 174819,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 283389,1 3145728,0128,0
 185856,0 185371,0  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 342596,4 3145728,0128,0
 185856,0 185379,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 547634,7 3145728,0128,0
 185856,0 185385,8  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 555930,9 3145728,0128,0
 185856,0 185385,8  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 557888,6 3145728,0128,0
 185856,0 185386,0  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 573907,5 3145728,0128,0
 185856,0 185397,5  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 637955,0 3145728,0128,0
 189952,0 189533,1  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 895866,1 3145728,0128,0
 196096,0 195968,5  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 948046,5 3145728,0128,0
 196096,0 195969,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 952427,2 3145728,0128,0
 196096,0 195969,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 957977,5 3145728,0128,0
 196096,0 195973,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 977811,1 3145728,0128,0
 196096,0 195977,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 1118722,0 3145728,0128,0
 206848,0 206539,0  50,591   0  0,0000,591
 131584,0 144384,0 118692,5  0,0   1284096,0 183470,8 3145728,0136,0
 206848,0 206543,4  60,773   0  0,0000,773
 131584,0 144384,0 118692,5  0,0   1284096,0 189718,5 3145728,0136,0  

[jira] [Commented] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak

2015-08-02 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651073#comment-14651073
 ] 

Sean Owen commented on SPARK-9539:
--

This just shows Spark is using memory. It's normal to use some of the permanent 
generation. Your jstat dump shows normal growth and GC of the heap. It does not 
show any out-of-memory condition. It may simply be that you need to increase 
the memory you allocate, especially the permanent generation (you should 
probably read up on this). Unless you can point to an actual memory leak from a 
heap dump, I'd like to close this.

 Repeated sc.close() in PySpark causes JVM memory leak
 -

 Key: SPARK-9539
 URL: https://issues.apache.org/jira/browse/SPARK-9539
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.4.1
Reporter: Andrey Zimovnov
Priority: Minor
 Attachments: Screenshot at авг. 02 19-10-53.png


 Example code in Python:
 {code:python}
 for i in range(20):
   print i
   conf = SparkConf().setAppName(test)
   sc = SparkContext(conf=conf)
   hivec = HiveContext(sc)
   hivec.sql(select id from details_info limit 1).show()
   sc.stop()
   del hivec
   del sc
 {code}
 Jstat output:
 {noformat}
  S0CS1CS0US1U  EC   EUOC OU   PC 
 PUYGC YGCTFGCFGCT GCT
 196608,0 196608,0 97566,2  0,0   1179648,0 542150,0 3145728,0120,0
 154112,0 153613,2  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 679041,7 3145728,0120,0
 164352,0 164183,3  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 907928,4 3145728,0120,0
 164352,0 164200,3  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 912132,7 3145728,0120,0
 164352,0 164200,5  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 913741,5 3145728,0120,0
 164352,0 164200,8  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 929458,6 3145728,0120,0
 164352,0 164206,0  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 1003138,1 3145728,0120,0
 168960,0 168646,0  40,434   0  0,0000,434
 131584,0 196608,0  0,0   109725,6 1179648,0   0,03145728,0128,0
 175104,0 174802,1  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 152654,9 3145728,0128,0
 175104,0 174803,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 158586,1 3145728,0128,0
 175104,0 174803,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 160659,8 3145728,0128,0
 175104,0 174805,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 181935,2 3145728,0128,0
 175104,0 174819,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 283389,1 3145728,0128,0
 185856,0 185371,0  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 342596,4 3145728,0128,0
 185856,0 185379,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 547634,7 3145728,0128,0
 185856,0 185385,8  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 555930,9 3145728,0128,0
 185856,0 185385,8  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 557888,6 3145728,0128,0
 185856,0 185386,0  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 573907,5 3145728,0128,0
 185856,0 185397,5  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 637955,0 3145728,0128,0
 189952,0 189533,1  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 895866,1 3145728,0128,0
 196096,0 195968,5  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 948046,5 3145728,0128,0
 196096,0 195969,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 952427,2 3145728,0128,0
 196096,0 195969,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 957977,5 3145728,0128,0
 196096,0 195973,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 977811,1 3145728,0128,0
 196096,0 195977,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 1118722,0 3145728,0128,0
 206848,0 206539,0  50,591   0  0,0000,591
 131584,0 144384,0 

[jira] [Created] (SPARK-9542) create unsafe version of map type

2015-08-02 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-9542:
--

 Summary: create unsafe version of map type
 Key: SPARK-9542
 URL: https://issues.apache.org/jira/browse/SPARK-9542
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9542) create unsafe version of map type

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9542:
---

Assignee: (was: Apache Spark)

 create unsafe version of map type
 -

 Key: SPARK-9542
 URL: https://issues.apache.org/jira/browse/SPARK-9542
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Wenchen Fan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651062#comment-14651062
 ] 

Apache Spark commented on SPARK-9527:
-

User 'mengxr' has created a pull request for this issue:
https://github.com/apache/spark/pull/7869

 PrefixSpan.run should return a PrefixSpanModel instead of an RDD
 

 Key: SPARK-9527
 URL: https://issues.apache.org/jira/browse/SPARK-9527
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Critical

 With a model wrapping the result RDD, it would be more flexible to add 
 features in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9527:
---

Assignee: Apache Spark  (was: Xiangrui Meng)

 PrefixSpan.run should return a PrefixSpanModel instead of an RDD
 

 Key: SPARK-9527
 URL: https://issues.apache.org/jira/browse/SPARK-9527
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Apache Spark
Priority: Critical

 With a model wrapping the result RDD, it would be more flexible to add 
 features in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak

2015-08-02 Thread Andrey Zimovnov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Zimovnov updated SPARK-9539:
---
Description: 
Example code in Python:
{code:python}
for i in range(20):
print i
conf = SparkConf().setAppName(test)
sc = SparkContext(conf=conf)
hivec = HiveContext(sc)
hivec.sql(select id from details_info limit 1).show()
sc.stop()
del hivec
del sc
{code}

Jstat output:
{noformat}
 S0CS1CS0US1U  EC   EUOC OU   PC PU 
   YGC YGCTFGCFGCT GCT
196608,0 196608,0 97566,2  0,0   1179648,0 542150,0 3145728,0120,0
154112,0 153613,2  40,434   0  0,0000,434
196608,0 196608,0 97566,2  0,0   1179648,0 679041,7 3145728,0120,0
164352,0 164183,3  40,434   0  0,0000,434
196608,0 196608,0 97566,2  0,0   1179648,0 907928,4 3145728,0120,0
164352,0 164200,3  40,434   0  0,0000,434
196608,0 196608,0 97566,2  0,0   1179648,0 912132,7 3145728,0120,0
164352,0 164200,5  40,434   0  0,0000,434
196608,0 196608,0 97566,2  0,0   1179648,0 913741,5 3145728,0120,0
164352,0 164200,8  40,434   0  0,0000,434
196608,0 196608,0 97566,2  0,0   1179648,0 929458,6 3145728,0120,0
164352,0 164206,0  40,434   0  0,0000,434
196608,0 196608,0 97566,2  0,0   1179648,0 1003138,1 3145728,0120,0
168960,0 168646,0  40,434   0  0,0000,434
131584,0 196608,0  0,0   109725,6 1179648,0   0,03145728,0128,0
175104,0 174802,1  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 152654,9 3145728,0128,0
175104,0 174803,3  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 158586,1 3145728,0128,0
175104,0 174803,3  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 160659,8 3145728,0128,0
175104,0 174805,7  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 181935,2 3145728,0128,0
175104,0 174819,7  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 283389,1 3145728,0128,0
185856,0 185371,0  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 342596,4 3145728,0128,0
185856,0 185379,3  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 547634,7 3145728,0128,0
185856,0 185385,8  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 555930,9 3145728,0128,0
185856,0 185385,8  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 557888,6 3145728,0128,0
185856,0 185386,0  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 573907,5 3145728,0128,0
185856,0 185397,5  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 637955,0 3145728,0128,0
189952,0 189533,1  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 895866,1 3145728,0128,0
196096,0 195968,5  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 948046,5 3145728,0128,0
196096,0 195969,4  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 952427,2 3145728,0128,0
196096,0 195969,4  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 957977,5 3145728,0128,0
196096,0 195973,4  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 977811,1 3145728,0128,0
196096,0 195977,7  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 1118722,0 3145728,0128,0
206848,0 206539,0  50,591   0  0,0000,591
131584,0 144384,0 118692,5  0,0   1284096,0 183470,8 3145728,0136,0
206848,0 206543,4  60,773   0  0,0000,773
131584,0 144384,0 118692,5  0,0   1284096,0 189718,5 3145728,0136,0
206848,0 206543,4  60,773   0  0,0000,773
131584,0 144384,0 118692,5  0,0   1284096,0 192165,0 3145728,0136,0
206848,0 206543,4  60,773   0  0,0000,773
131584,0 144384,0 118692,5  0,0   1284096,0 199848,4 3145728,0136,0
206848,0 206546,9  60,773   0  0,0000,773
131584,0 144384,0 118692,5  0,0   1284096,0 219687,6 3145728,0136,0
206848,0 206552,2  60,773   0  0,0000,773
131584,0 144384,0 118692,5  0,0   1284096,0 358272,4 3145728,0136,0
217600,0 217100,4  60,773   0  0,0000,773
131584,0 144384,0 118692,5  0,0   1284096,0 573543,6 3145728,0136,0
217600,0 217109,4  60,773   0  0,0000,773
131584,0 144384,0 118692,5  0,0   1284096,0 

[jira] [Commented] (SPARK-5754) Spark AM not launching on Windows

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651140#comment-14651140
 ] 

Apache Spark commented on SPARK-5754:
-

User 'cbvoxel' has created a pull request for this issue:
https://github.com/apache/spark/pull/7872

 Spark AM not launching on Windows
 -

 Key: SPARK-5754
 URL: https://issues.apache.org/jira/browse/SPARK-5754
 Project: Spark
  Issue Type: Bug
  Components: Windows, YARN
Affects Versions: 1.1.1, 1.2.0
 Environment: Windows Server 2012, Hadoop 2.4.1.
Reporter: Inigo

 I'm trying to run Spark Pi on a YARN cluster running on Windows and the AM 
 container fails to start. The problem seems to be in the generation of the 
 YARN command which adds single quotes (') surrounding some of the java 
 options. In particular, the part of the code that is adding those is the 
 escapeForShell function in YarnSparkHadoopUtil. Apparently, Windows does not 
 like the quotes for these options. Here is an example of the command that the 
 container tries to execute:
 @call %JAVA_HOME%/bin/java -server -Xmx512m -Djava.io.tmpdir=%PWD%/tmp 
 '-Dspark.yarn.secondary.jars=' 
 '-Dspark.app.name=org.apache.spark.examples.SparkPi' 
 '-Dspark.master=yarn-cluster' org.apache.spark.deploy.yarn.ApplicationMaster 
 --class 'org.apache.spark.examples.SparkPi' --jar  
 'file:/D:/data/spark-1.1.1-bin-hadoop2.4/bin/../lib/spark-examples-1.1.1-hadoop2.4.0.jar'
   --executor-memory 1024 --executor-cores 1 --num-executors 2
 Once I transform it into:
 @call %JAVA_HOME%/bin/java -server -Xmx512m -Djava.io.tmpdir=%PWD%/tmp 
 -Dspark.yarn.secondary.jars= 
 -Dspark.app.name=org.apache.spark.examples.SparkPi 
 -Dspark.master=yarn-cluster org.apache.spark.deploy.yarn.ApplicationMaster 
 --class 'org.apache.spark.examples.SparkPi' --jar  
 'file:/D:/data/spark-1.1.1-bin-hadoop2.4/bin/../lib/spark-examples-1.1.1-hadoop2.4.0.jar'
   --executor-memory 1024 --executor-cores 1 --num-executors 2
 Everything seems to start.
 How should I deal with this? Creating a separate function like escapeForShell 
 for Windows and call it whenever I detect this is for Windows? Or should I 
 add some sanity check on YARN?
 I checked a little and there seems to be people that is able to run Spark on 
 YARN on Windows, so it might be something else. I didn't find anything 
 related on Jira either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it should be Java-friendly

2015-08-02 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-9527:
-
Shepherd: Feynman Liang

 PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it 
 should be Java-friendly
 ---

 Key: SPARK-9527
 URL: https://issues.apache.org/jira/browse/SPARK-9527
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Critical

 With a model wrapping the result RDD, it would be more flexible to add 
 features in the future. And it should be Java-friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it should be Java-friendly

2015-08-02 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-9527:
-
Summary: PrefixSpan.run should return a PrefixSpanModel instead of an RDD 
and it should be Java-friendly  (was: PrefixSpan.run should return a 
PrefixSpanModel instead of an RDD)

 PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it 
 should be Java-friendly
 ---

 Key: SPARK-9527
 URL: https://issues.apache.org/jira/browse/SPARK-9527
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Critical

 With a model wrapping the result RDD, it would be more flexible to add 
 features in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it should be Java-friendly

2015-08-02 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-9527:
-
Description: With a model wrapping the result RDD, it would be more 
flexible to add features in the future. And it should be Java-friendly.  (was: 
With a model wrapping the result RDD, it would be more flexible to add features 
in the future.)

 PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it 
 should be Java-friendly
 ---

 Key: SPARK-9527
 URL: https://issues.apache.org/jira/browse/SPARK-9527
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Critical

 With a model wrapping the result RDD, it would be more flexible to add 
 features in the future. And it should be Java-friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak

2015-08-02 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651064#comment-14651064
 ] 

Sean Owen commented on SPARK-9539:
--

Why do you think this is a memory leak? That exception does not even indicate 
an out-of-memory condition.

 Repeated sc.close() in PySpark causes JVM memory leak
 -

 Key: SPARK-9539
 URL: https://issues.apache.org/jira/browse/SPARK-9539
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.4.1
Reporter: Andrey Zimovnov
Priority: Minor
 Attachments: Screenshot at авг. 02 19-10-53.png


 Example code in Python:
 {code:python}
 for i in range(20):
   print i
   conf = SparkConf().setAppName(test)
   sc = SparkContext(conf=conf)
   hivec = HiveContext(sc)
   hivec.sql(select id from details_info limit 1).show()
   sc.stop()
   del hivec
   del sc
 {code}
 Jstat output:
 {noformat}
  S0CS1CS0US1U  EC   EUOC OU   PC 
 PUYGC YGCTFGCFGCT GCT
 196608,0 196608,0 97566,2  0,0   1179648,0 542150,0 3145728,0120,0
 154112,0 153613,2  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 679041,7 3145728,0120,0
 164352,0 164183,3  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 907928,4 3145728,0120,0
 164352,0 164200,3  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 912132,7 3145728,0120,0
 164352,0 164200,5  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 913741,5 3145728,0120,0
 164352,0 164200,8  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 929458,6 3145728,0120,0
 164352,0 164206,0  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 1003138,1 3145728,0120,0
 168960,0 168646,0  40,434   0  0,0000,434
 131584,0 196608,0  0,0   109725,6 1179648,0   0,03145728,0128,0
 175104,0 174802,1  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 152654,9 3145728,0128,0
 175104,0 174803,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 158586,1 3145728,0128,0
 175104,0 174803,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 160659,8 3145728,0128,0
 175104,0 174805,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 181935,2 3145728,0128,0
 175104,0 174819,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 283389,1 3145728,0128,0
 185856,0 185371,0  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 342596,4 3145728,0128,0
 185856,0 185379,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 547634,7 3145728,0128,0
 185856,0 185385,8  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 555930,9 3145728,0128,0
 185856,0 185385,8  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 557888,6 3145728,0128,0
 185856,0 185386,0  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 573907,5 3145728,0128,0
 185856,0 185397,5  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 637955,0 3145728,0128,0
 189952,0 189533,1  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 895866,1 3145728,0128,0
 196096,0 195968,5  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 948046,5 3145728,0128,0
 196096,0 195969,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 952427,2 3145728,0128,0
 196096,0 195969,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 957977,5 3145728,0128,0
 196096,0 195973,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 977811,1 3145728,0128,0
 196096,0 195977,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 1118722,0 3145728,0128,0
 206848,0 206539,0  50,591   0  0,0000,591
 131584,0 144384,0 118692,5  0,0   1284096,0 183470,8 3145728,0136,0
 206848,0 206543,4  60,773   0  0,0000,773
 131584,0 144384,0 118692,5  0,0   1284096,0 189718,5 3145728,0136,0
 206848,0 206543,4  60,773   0  0,0000,773
 131584,0 144384,0 118692,5  0,0   1284096,0 192165,0 3145728,0136,0   

[jira] [Created] (SPARK-9540) Optimize PrefixSpan implementation

2015-08-02 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-9540:


 Summary: Optimize PrefixSpan implementation
 Key: SPARK-9540
 URL: https://issues.apache.org/jira/browse/SPARK-9540
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Critical


Current `PrefixSpan` implementation contains some major issues:

1. We should expand the prefix by one item at a time instead of by one itemset.
2. Some set operations should be changed to array operations, which should be 
more efficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak

2015-08-02 Thread Andrey Zimovnov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Zimovnov updated SPARK-9539:
---
Attachment: Screenshot at авг. 02 19-10-53.png

jstat visualization

 Repeated sc.close() in PySpark causes JVM memory leak
 -

 Key: SPARK-9539
 URL: https://issues.apache.org/jira/browse/SPARK-9539
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.4.1
Reporter: Andrey Zimovnov
Priority: Minor
 Attachments: Screenshot at авг. 02 19-10-53.png


 Example code in Python:
 {code:python}
 for i in range(20):
   print i
   conf = SparkConf().setAppName(test)
   sc = SparkContext(conf=conf)
   hivec = HiveContext(sc)
   hivec.sql(select id from details_info limit 1).show()
   sc.stop()
   del hivec
   del sc
 {code}
 Jstat output:
 {noformat}
  S0CS1CS0US1U  EC   EUOC OU   PC 
 PUYGC YGCTFGCFGCT GCT
 196608,0 196608,0 97566,2  0,0   1179648,0 542150,0 3145728,0120,0
 154112,0 153613,2  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 679041,7 3145728,0120,0
 164352,0 164183,3  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 907928,4 3145728,0120,0
 164352,0 164200,3  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 912132,7 3145728,0120,0
 164352,0 164200,5  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 913741,5 3145728,0120,0
 164352,0 164200,8  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 929458,6 3145728,0120,0
 164352,0 164206,0  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 1003138,1 3145728,0120,0
 168960,0 168646,0  40,434   0  0,0000,434
 131584,0 196608,0  0,0   109725,6 1179648,0   0,03145728,0128,0
 175104,0 174802,1  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 152654,9 3145728,0128,0
 175104,0 174803,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 158586,1 3145728,0128,0
 175104,0 174803,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 160659,8 3145728,0128,0
 175104,0 174805,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 181935,2 3145728,0128,0
 175104,0 174819,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 283389,1 3145728,0128,0
 185856,0 185371,0  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 342596,4 3145728,0128,0
 185856,0 185379,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 547634,7 3145728,0128,0
 185856,0 185385,8  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 555930,9 3145728,0128,0
 185856,0 185385,8  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 557888,6 3145728,0128,0
 185856,0 185386,0  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 573907,5 3145728,0128,0
 185856,0 185397,5  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 637955,0 3145728,0128,0
 189952,0 189533,1  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 895866,1 3145728,0128,0
 196096,0 195968,5  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 948046,5 3145728,0128,0
 196096,0 195969,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 952427,2 3145728,0128,0
 196096,0 195969,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 957977,5 3145728,0128,0
 196096,0 195973,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 977811,1 3145728,0128,0
 196096,0 195977,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 1118722,0 3145728,0128,0
 206848,0 206539,0  50,591   0  0,0000,591
 131584,0 144384,0 118692,5  0,0   1284096,0 183470,8 3145728,0136,0
 206848,0 206543,4  60,773   0  0,0000,773
 131584,0 144384,0 118692,5  0,0   1284096,0 189718,5 3145728,0136,0
 206848,0 206543,4  60,773   0  0,0000,773
 131584,0 144384,0 118692,5  0,0   1284096,0 192165,0 3145728,0136,0
 206848,0 206543,4  60,773   0  0,0000,773
 131584,0 144384,0 

[jira] [Commented] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak

2015-08-02 Thread Andrey Zimovnov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651078#comment-14651078
 ] 

Andrey Zimovnov commented on SPARK-9539:


OK, I'll work on this later and reopen if necessary. Thanks!

 Repeated sc.close() in PySpark causes JVM memory leak
 -

 Key: SPARK-9539
 URL: https://issues.apache.org/jira/browse/SPARK-9539
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.4.1
Reporter: Andrey Zimovnov
Priority: Minor
 Attachments: Screenshot at авг. 02 19-10-53.png


 Example code in Python:
 {code:python}
 for i in range(20):
   print i
   conf = SparkConf().setAppName(test)
   sc = SparkContext(conf=conf)
   hivec = HiveContext(sc)
   hivec.sql(select id from details_info limit 1).show()
   sc.stop()
   del hivec
   del sc
 {code}
 Jstat output:
 {noformat}
  S0CS1CS0US1U  EC   EUOC OU   PC 
 PUYGC YGCTFGCFGCT GCT
 196608,0 196608,0 97566,2  0,0   1179648,0 542150,0 3145728,0120,0
 154112,0 153613,2  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 679041,7 3145728,0120,0
 164352,0 164183,3  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 907928,4 3145728,0120,0
 164352,0 164200,3  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 912132,7 3145728,0120,0
 164352,0 164200,5  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 913741,5 3145728,0120,0
 164352,0 164200,8  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 929458,6 3145728,0120,0
 164352,0 164206,0  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 1003138,1 3145728,0120,0
 168960,0 168646,0  40,434   0  0,0000,434
 131584,0 196608,0  0,0   109725,6 1179648,0   0,03145728,0128,0
 175104,0 174802,1  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 152654,9 3145728,0128,0
 175104,0 174803,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 158586,1 3145728,0128,0
 175104,0 174803,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 160659,8 3145728,0128,0
 175104,0 174805,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 181935,2 3145728,0128,0
 175104,0 174819,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 283389,1 3145728,0128,0
 185856,0 185371,0  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 342596,4 3145728,0128,0
 185856,0 185379,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 547634,7 3145728,0128,0
 185856,0 185385,8  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 555930,9 3145728,0128,0
 185856,0 185385,8  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 557888,6 3145728,0128,0
 185856,0 185386,0  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 573907,5 3145728,0128,0
 185856,0 185397,5  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 637955,0 3145728,0128,0
 189952,0 189533,1  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 895866,1 3145728,0128,0
 196096,0 195968,5  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 948046,5 3145728,0128,0
 196096,0 195969,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 952427,2 3145728,0128,0
 196096,0 195969,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 957977,5 3145728,0128,0
 196096,0 195973,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 977811,1 3145728,0128,0
 196096,0 195977,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 1118722,0 3145728,0128,0
 206848,0 206539,0  50,591   0  0,0000,591
 131584,0 144384,0 118692,5  0,0   1284096,0 183470,8 3145728,0136,0
 206848,0 206543,4  60,773   0  0,0000,773
 131584,0 144384,0 118692,5  0,0   1284096,0 189718,5 3145728,0136,0
 206848,0 206543,4  60,773   0  0,0000,773
 131584,0 144384,0 118692,5  0,0   1284096,0 192165,0 3145728,0136,0
 206848,0 206543,4  6

[jira] [Created] (SPARK-9541) DateTimeUtils cleanup

2015-08-02 Thread Yijie Shen (JIRA)
Yijie Shen created SPARK-9541:
-

 Summary: DateTimeUtils cleanup
 Key: SPARK-9541
 URL: https://issues.apache.org/jira/browse/SPARK-9541
 Project: Spark
  Issue Type: Sub-task
Reporter: Yijie Shen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak

2015-08-02 Thread Andrey Zimovnov (JIRA)
Andrey Zimovnov created SPARK-9539:
--

 Summary: Repeated sc.close() in PySpark causes JVM memory leak
 Key: SPARK-9539
 URL: https://issues.apache.org/jira/browse/SPARK-9539
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.4.1
Reporter: Andrey Zimovnov
Priority: Minor


Example code in Python:
for i in range(20):
print i
conf = SparkConf().setAppName(test)
sc = SparkContext(conf=conf)
hivec = HiveContext(sc)
hivec.sql(select id from details_info limit 1).show()
sc.stop()
del hivec
del sc

Jstat output:
 S0CS1CS0US1U  EC   EUOC OU   PC PU 
   YGC YGCTFGCFGCT GCT
196608,0 196608,0 97566,2  0,0   1179648,0 542150,0 3145728,0120,0
154112,0 153613,2  40,434   0  0,0000,434
196608,0 196608,0 97566,2  0,0   1179648,0 679041,7 3145728,0120,0
164352,0 164183,3  40,434   0  0,0000,434
196608,0 196608,0 97566,2  0,0   1179648,0 907928,4 3145728,0120,0
164352,0 164200,3  40,434   0  0,0000,434
196608,0 196608,0 97566,2  0,0   1179648,0 912132,7 3145728,0120,0
164352,0 164200,5  40,434   0  0,0000,434
196608,0 196608,0 97566,2  0,0   1179648,0 913741,5 3145728,0120,0
164352,0 164200,8  40,434   0  0,0000,434
196608,0 196608,0 97566,2  0,0   1179648,0 929458,6 3145728,0120,0
164352,0 164206,0  40,434   0  0,0000,434
196608,0 196608,0 97566,2  0,0   1179648,0 1003138,1 3145728,0120,0
168960,0 168646,0  40,434   0  0,0000,434
131584,0 196608,0  0,0   109725,6 1179648,0   0,03145728,0128,0
175104,0 174802,1  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 152654,9 3145728,0128,0
175104,0 174803,3  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 158586,1 3145728,0128,0
175104,0 174803,3  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 160659,8 3145728,0128,0
175104,0 174805,7  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 181935,2 3145728,0128,0
175104,0 174819,7  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 283389,1 3145728,0128,0
185856,0 185371,0  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 342596,4 3145728,0128,0
185856,0 185379,3  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 547634,7 3145728,0128,0
185856,0 185385,8  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 555930,9 3145728,0128,0
185856,0 185385,8  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 557888,6 3145728,0128,0
185856,0 185386,0  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 573907,5 3145728,0128,0
185856,0 185397,5  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 637955,0 3145728,0128,0
189952,0 189533,1  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 895866,1 3145728,0128,0
196096,0 195968,5  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 948046,5 3145728,0128,0
196096,0 195969,4  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 952427,2 3145728,0128,0
196096,0 195969,4  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 957977,5 3145728,0128,0
196096,0 195973,4  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 977811,1 3145728,0128,0
196096,0 195977,7  50,591   0  0,0000,591
131584,0 196608,0  0,0   109725,6 1179648,0 1118722,0 3145728,0128,0
206848,0 206539,0  50,591   0  0,0000,591
131584,0 144384,0 118692,5  0,0   1284096,0 183470,8 3145728,0136,0
206848,0 206543,4  60,773   0  0,0000,773
131584,0 144384,0 118692,5  0,0   1284096,0 189718,5 3145728,0136,0
206848,0 206543,4  60,773   0  0,0000,773
131584,0 144384,0 118692,5  0,0   1284096,0 192165,0 3145728,0136,0
206848,0 206543,4  60,773   0  0,0000,773
131584,0 144384,0 118692,5  0,0   1284096,0 199848,4 3145728,0136,0
206848,0 206546,9  60,773   0  0,0000,773
131584,0 144384,0 118692,5  0,0   1284096,0 219687,6 3145728,0136,0
206848,0 206552,2  60,773   0  0,0000,773
131584,0 144384,0 118692,5  0,0   1284096,0 358272,4 3145728,0136,0
217600,0 217100,4  60,773   0  0,000   

[jira] [Updated] (SPARK-8445) MLlib 1.5 Roadmap

2015-08-02 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8445:
-
Description: 
We expect to see many MLlib contributors for the 1.5 release. To scale out the 
development, we created this master list for MLlib features we plan to have in 
Spark 1.5. Please view this list as a wish list rather than a concrete plan, 
because we don't have an accurate estimate of available resources. Due to 
limited review bandwidth, features appearing on this list will get higher 
priority during code review. But feel free to suggest new items to the list in 
comments. We are experimenting with this process. Your feedback would be 
greatly appreciated.

h1. Instructions

h2. For contributors:

* Please read 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark 
carefully. Code style, documentation, and unit tests are important.
* If you are a first-time Spark contributor, please always start with a 
[starter 
task|https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20AND%20labels%20%3D%20starter%20AND%20%22Target%20Version%2Fs%22%20%3D%201.5.0]
 rather than a medium/big feature. Based on our experience, mixing the 
development process with a big feature usually causes long delay in code review.
* Never work silently. Let everyone know on the corresponding JIRA page when 
you start working on some features. This is to avoid duplicate work. For small 
features, you don't need to wait to get JIRA assigned.
* For medium/big features or features with dependencies, please get assigned 
first before coding and keep the ETA updated on the JIRA. If there exist no 
activity on the JIRA page for a certain amount of time, the JIRA should be 
released for other contributors.
* Do not claim multiple (3) JIRAs at the same time. Try to finish them one 
after another.
* Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code review 
greatly helps improve others' code as well as yours.

h2. For committers:

* Try to break down big features into small and specific JIRA tasks and link 
them properly.
* Add starter label to starter tasks.
* Put a rough estimate for medium/big features and track the progress.
* If you start reviewing a PR, please add yourself to the Shepherd field on 
JIRA.
* If the code looks good to you, please comment LGTM. For non-trivial PRs, 
please ping a maintainer to make a final pass.
* After merging a PR, create and link JIRAs for Python, example code, and 
documentation if necessary.

h1. Roadmap (WIP)

This is NOT [a complete list of MLlib JIRAs for 
1.5|https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20AND%20%22Target%20Version%2Fs%22%20%3D%201.5.0%20ORDER%20BY%20priority%20DESC].
 We only include umbrella JIRAs and high-level tasks.

h2. Algorithms and performance

* LDA improvements (SPARK-5572)
* Log-linear model for survival analysis (SPARK-8518) - 1.6
* Improve GLM's scalability on number of features (SPARK-8520)
* Tree and ensembles: Move + cleanup code (SPARK-7131), provide class 
probabilities (SPARK-3727), feature importance (SPARK-5133)
* Improve GMM scalability and stability (SPARK-5016)
* Frequent pattern mining improvements (SPARK-6487)
* R-like stats for ML models (SPARK-7674)
* Generalize classification threshold to multiclass (SPARK-8069)
* A/B testing (SPARK-3147)

h2. Pipeline API

* more feature transformers (SPARK-8521)
* k-means (SPARK-7879)
* naive Bayes (SPARK-8600)
* TrainValidationSplit for tuning (SPARK-8484)
* Isotonic regression (SPARK-8671)

h2. Model persistence

* more PMML export (SPARK-8545)
* model save/load (SPARK-4587)
* pipeline persistence (SPARK-6725)

h2. Python API for ML

* List of issues identified during Spark 1.4 QA: (SPARK-7536)
* Python API for streaming ML algorithms (SPARK-3258)
* Add missing model methods (SPARK-8633)

h2. SparkR API for ML

* MLlib + SparkR integration for 1.5 (RFormula + glm) (SPARK-6805)
* model.matrix for DataFrames (SPARK-6823)

h2. Documentation

* [Search for documentation improvements | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(Documentation)%20AND%20component%20in%20(ML%2C%20MLlib)]

  was:
We expect to see many MLlib contributors for the 1.5 release. To scale out the 
development, we created this master list for MLlib features we plan to have in 
Spark 1.5. Please view this list as a wish list rather than a concrete plan, 
because we don't have an accurate estimate of available resources. Due to 
limited review bandwidth, features appearing on this list will get higher 
priority during code review. But feel free to 

[jira] [Commented] (SPARK-9541) DateTimeUtils cleanup

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651082#comment-14651082
 ] 

Apache Spark commented on SPARK-9541:
-

User 'yjshen' has created a pull request for this issue:
https://github.com/apache/spark/pull/7870

 DateTimeUtils cleanup
 -

 Key: SPARK-9541
 URL: https://issues.apache.org/jira/browse/SPARK-9541
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yijie Shen





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9541) DateTimeUtils cleanup

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9541:
---

Assignee: (was: Apache Spark)

 DateTimeUtils cleanup
 -

 Key: SPARK-9541
 URL: https://issues.apache.org/jira/browse/SPARK-9541
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yijie Shen





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9541) DateTimeUtils cleanup

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9541:
---

Assignee: Apache Spark

 DateTimeUtils cleanup
 -

 Key: SPARK-9541
 URL: https://issues.apache.org/jira/browse/SPARK-9541
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yijie Shen
Assignee: Apache Spark





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9140) Replace TimeTracker by Stopwatch

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9140:
---

Assignee: Apache Spark

 Replace TimeTracker by Stopwatch
 

 Key: SPARK-9140
 URL: https://issues.apache.org/jira/browse/SPARK-9140
 Project: Spark
  Issue Type: Sub-task
  Components: ML, MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Apache Spark
Priority: Minor

 We can replace TImeTracker in tree implementations by Stopwatch. The initial 
 PR could use local stopwatches only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9140) Replace TimeTracker by Stopwatch

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9140:
---

Assignee: (was: Apache Spark)

 Replace TimeTracker by Stopwatch
 

 Key: SPARK-9140
 URL: https://issues.apache.org/jira/browse/SPARK-9140
 Project: Spark
  Issue Type: Sub-task
  Components: ML, MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Priority: Minor

 We can replace TImeTracker in tree implementations by Stopwatch. The initial 
 PR could use local stopwatches only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9140) Replace TimeTracker by Stopwatch

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651083#comment-14651083
 ] 

Apache Spark commented on SPARK-9140:
-

User 'hhbyyh' has created a pull request for this issue:
https://github.com/apache/spark/pull/7871

 Replace TimeTracker by Stopwatch
 

 Key: SPARK-9140
 URL: https://issues.apache.org/jira/browse/SPARK-9140
 Project: Spark
  Issue Type: Sub-task
  Components: ML, MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Priority: Minor

 We can replace TImeTracker in tree implementations by Stopwatch. The initial 
 PR could use local stopwatches only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak

2015-08-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-9539.
--
Resolution: Not A Problem

 Repeated sc.close() in PySpark causes JVM memory leak
 -

 Key: SPARK-9539
 URL: https://issues.apache.org/jira/browse/SPARK-9539
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.4.1
Reporter: Andrey Zimovnov
Priority: Minor
 Attachments: Screenshot at авг. 02 19-10-53.png


 Example code in Python:
 {code:python}
 for i in range(20):
   print i
   conf = SparkConf().setAppName(test)
   sc = SparkContext(conf=conf)
   hivec = HiveContext(sc)
   hivec.sql(select id from details_info limit 1).show()
   sc.stop()
   del hivec
   del sc
 {code}
 Jstat output:
 {noformat}
  S0CS1CS0US1U  EC   EUOC OU   PC 
 PUYGC YGCTFGCFGCT GCT
 196608,0 196608,0 97566,2  0,0   1179648,0 542150,0 3145728,0120,0
 154112,0 153613,2  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 679041,7 3145728,0120,0
 164352,0 164183,3  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 907928,4 3145728,0120,0
 164352,0 164200,3  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 912132,7 3145728,0120,0
 164352,0 164200,5  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 913741,5 3145728,0120,0
 164352,0 164200,8  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 929458,6 3145728,0120,0
 164352,0 164206,0  40,434   0  0,0000,434
 196608,0 196608,0 97566,2  0,0   1179648,0 1003138,1 3145728,0120,0
 168960,0 168646,0  40,434   0  0,0000,434
 131584,0 196608,0  0,0   109725,6 1179648,0   0,03145728,0128,0
 175104,0 174802,1  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 152654,9 3145728,0128,0
 175104,0 174803,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 158586,1 3145728,0128,0
 175104,0 174803,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 160659,8 3145728,0128,0
 175104,0 174805,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 181935,2 3145728,0128,0
 175104,0 174819,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 283389,1 3145728,0128,0
 185856,0 185371,0  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 342596,4 3145728,0128,0
 185856,0 185379,3  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 547634,7 3145728,0128,0
 185856,0 185385,8  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 555930,9 3145728,0128,0
 185856,0 185385,8  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 557888,6 3145728,0128,0
 185856,0 185386,0  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 573907,5 3145728,0128,0
 185856,0 185397,5  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 637955,0 3145728,0128,0
 189952,0 189533,1  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 895866,1 3145728,0128,0
 196096,0 195968,5  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 948046,5 3145728,0128,0
 196096,0 195969,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 952427,2 3145728,0128,0
 196096,0 195969,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 957977,5 3145728,0128,0
 196096,0 195973,4  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 977811,1 3145728,0128,0
 196096,0 195977,7  50,591   0  0,0000,591
 131584,0 196608,0  0,0   109725,6 1179648,0 1118722,0 3145728,0128,0
 206848,0 206539,0  50,591   0  0,0000,591
 131584,0 144384,0 118692,5  0,0   1284096,0 183470,8 3145728,0136,0
 206848,0 206543,4  60,773   0  0,0000,773
 131584,0 144384,0 118692,5  0,0   1284096,0 189718,5 3145728,0136,0
 206848,0 206543,4  60,773   0  0,0000,773
 131584,0 144384,0 118692,5  0,0   1284096,0 192165,0 3145728,0136,0
 206848,0 206543,4  60,773   0  0,0000,773
 131584,0 144384,0 118692,5  0,0   1284096,0 199848,4 3145728,0

[jira] [Commented] (SPARK-9542) create unsafe version of map type

2015-08-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651100#comment-14651100
 ] 

Apache Spark commented on SPARK-9542:
-

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/7752

 create unsafe version of map type
 -

 Key: SPARK-9542
 URL: https://issues.apache.org/jira/browse/SPARK-9542
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Wenchen Fan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9542) create unsafe version of map type

2015-08-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9542:
---

Assignee: Apache Spark

 create unsafe version of map type
 -

 Key: SPARK-9542
 URL: https://issues.apache.org/jira/browse/SPARK-9542
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Wenchen Fan
Assignee: Apache Spark





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9536) NaiveBayesModel support probability prediction for PySpark.ml

2015-08-02 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-9536:
-
Assignee: Yanbo Liang

 NaiveBayesModel support probability prediction for PySpark.ml
 -

 Key: SPARK-9536
 URL: https://issues.apache.org/jira/browse/SPARK-9536
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Assignee: Yanbo Liang
Priority: Minor

 NaiveBayesModel support probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9538) LogisticRegression support raw and probability prediction for PySpark.ml

2015-08-02 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-9538:
-
Assignee: Yanbo Liang

 LogisticRegression support raw and probability prediction for PySpark.ml
 

 Key: SPARK-9538
 URL: https://issues.apache.org/jira/browse/SPARK-9538
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Assignee: Yanbo Liang
Priority: Minor

 LogisticRegression support raw and probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8874) Add missing methods in Word2Vec ML

2015-08-02 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-8874:
-
Component/s: (was: PySpark)

 Add missing methods in Word2Vec ML
 --

 Key: SPARK-8874
 URL: https://issues.apache.org/jira/browse/SPARK-8874
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: Manoj Kumar
Assignee: Manoj Kumar

 Add getVectors and findSynonyms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9537) DecisionTreeClassifierModel support probability prediction for PySpark.ml

2015-08-02 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-9537:
-
Assignee: Yanbo Liang
Target Version/s: 1.5.0

 DecisionTreeClassifierModel support probability prediction for PySpark.ml
 -

 Key: SPARK-9537
 URL: https://issues.apache.org/jira/browse/SPARK-9537
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Assignee: Yanbo Liang
Priority: Minor

 DecisionTreeClassifierModel support probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9538) LogisticRegression support raw and probability prediction for PySpark.ml

2015-08-02 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-9538:
-
Target Version/s: 1.5.0

 LogisticRegression support raw and probability prediction for PySpark.ml
 

 Key: SPARK-9538
 URL: https://issues.apache.org/jira/browse/SPARK-9538
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Assignee: Yanbo Liang
Priority: Minor

 LogisticRegression support raw and probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9536) NaiveBayesModel support probability prediction for PySpark.ml

2015-08-02 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-9536:
-
Target Version/s: 1.5.0

 NaiveBayesModel support probability prediction for PySpark.ml
 -

 Key: SPARK-9536
 URL: https://issues.apache.org/jira/browse/SPARK-9536
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Priority: Minor

 NaiveBayesModel support probability prediction for PySpark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it should be Java-friendly

2015-08-02 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-9527.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7869
[https://github.com/apache/spark/pull/7869]

 PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it 
 should be Java-friendly
 ---

 Key: SPARK-9527
 URL: https://issues.apache.org/jira/browse/SPARK-9527
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Critical
 Fix For: 1.5.0


 With a model wrapping the result RDD, it would be more flexible to add 
 features in the future. And it should be Java-friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >