[jira] [Assigned] (SPARK-8505) Add settings to kick `lint-r` from `./dev/run-test.py`
[ https://issues.apache.org/jira/browse/SPARK-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8505: --- Assignee: Apache Spark Add settings to kick `lint-r` from `./dev/run-test.py` -- Key: SPARK-8505 URL: https://issues.apache.org/jira/browse/SPARK-8505 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Assignee: Apache Spark Add some settings to kick `lint-r` script from `./dev/run-test.py` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8505) Add settings to kick `lint-r` from `./dev/run-test.py`
[ https://issues.apache.org/jira/browse/SPARK-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651360#comment-14651360 ] Apache Spark commented on SPARK-8505: - User 'yu-iskw' has created a pull request for this issue: https://github.com/apache/spark/pull/7883 Add settings to kick `lint-r` from `./dev/run-test.py` -- Key: SPARK-8505 URL: https://issues.apache.org/jira/browse/SPARK-8505 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add some settings to kick `lint-r` script from `./dev/run-test.py` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2205) Unnecessary exchange operators in a join on multiple tables with the same join key.
[ https://issues.apache.org/jira/browse/SPARK-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-2205. - Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7773 [https://github.com/apache/spark/pull/7773] Unnecessary exchange operators in a join on multiple tables with the same join key. --- Key: SPARK-2205 URL: https://issues.apache.org/jira/browse/SPARK-2205 Project: Spark Issue Type: Bug Components: SQL Reporter: Yin Huai Assignee: Yin Huai Priority: Critical Fix For: 1.5.0 {code} hql(select * from src x join src y on (x.key=y.key) join src z on (y.key=z.key)) SchemaRDD[1] at RDD at SchemaRDD.scala:100 == Query Plan == Project [key#4:0,value#5:1,key#6:2,value#7:3,key#8:4,value#9:5] HashJoin [key#6], [key#8], BuildRight Exchange (HashPartitioning [key#6], 200) HashJoin [key#4], [key#6], BuildRight Exchange (HashPartitioning [key#4], 200) HiveTableScan [key#4,value#5], (MetastoreRelation default, src, Some(x)), None Exchange (HashPartitioning [key#6], 200) HiveTableScan [key#6,value#7], (MetastoreRelation default, src, Some(y)), None Exchange (HashPartitioning [key#8], 200) HiveTableScan [key#8,value#9], (MetastoreRelation default, src, Some(z)), None {code} However, this is fine... {code} hql(select * from src x join src y on (x.key=y.key) join src z on (x.key=z.key)) res5: org.apache.spark.sql.SchemaRDD = SchemaRDD[5] at RDD at SchemaRDD.scala:100 == Query Plan == Project [key#26:0,value#27:1,key#28:2,value#29:3,key#30:4,value#31:5] HashJoin [key#26], [key#30], BuildRight HashJoin [key#26], [key#28], BuildRight Exchange (HashPartitioning [key#26], 200) HiveTableScan [key#26,value#27], (MetastoreRelation default, src, Some(x)), None Exchange (HashPartitioning [key#28], 200) HiveTableScan [key#28,value#29], (MetastoreRelation default, src, Some(y)), None Exchange (HashPartitioning [key#30], 200) HiveTableScan [key#30,value#31], (MetastoreRelation default, src, Some(z)), None {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7685) Handle high imbalanced data and apply weights to different samples in Logistic Regression
[ https://issues.apache.org/jira/browse/SPARK-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651413#comment-14651413 ] Apache Spark commented on SPARK-7685: - User 'dbtsai' has created a pull request for this issue: https://github.com/apache/spark/pull/7884 Handle high imbalanced data and apply weights to different samples in Logistic Regression - Key: SPARK-7685 URL: https://issues.apache.org/jira/browse/SPARK-7685 Project: Spark Issue Type: New Feature Components: ML Reporter: DB Tsai Assignee: DB Tsai Priority: Critical In fraud detection dataset, almost all the samples are negative while only couple of them are positive. This type of high imbalanced data will bias the models toward negative resulting poor performance. In python-scikit, they provide a correction allowing users to Over-/undersample the samples of each class according to the given weights. In auto mode, selects weights inversely proportional to class frequencies in the training set. This can be done in a more efficient way by multiplying the weights into loss and gradient instead of doing actual over/undersampling in the training dataset which is very expensive. http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html On the other hand, some of the training data maybe more important like the training samples from tenure users while the training samples from new users maybe less important. We should be able to provide another weight: Double information in the LabeledPoint to weight them differently in the learning algorithm. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7685) Handle high imbalanced data and apply weights to different samples in Logistic Regression
[ https://issues.apache.org/jira/browse/SPARK-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7685: --- Assignee: Apache Spark (was: DB Tsai) Handle high imbalanced data and apply weights to different samples in Logistic Regression - Key: SPARK-7685 URL: https://issues.apache.org/jira/browse/SPARK-7685 Project: Spark Issue Type: New Feature Components: ML Reporter: DB Tsai Assignee: Apache Spark Priority: Critical In fraud detection dataset, almost all the samples are negative while only couple of them are positive. This type of high imbalanced data will bias the models toward negative resulting poor performance. In python-scikit, they provide a correction allowing users to Over-/undersample the samples of each class according to the given weights. In auto mode, selects weights inversely proportional to class frequencies in the training set. This can be done in a more efficient way by multiplying the weights into loss and gradient instead of doing actual over/undersampling in the training dataset which is very expensive. http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html On the other hand, some of the training data maybe more important like the training samples from tenure users while the training samples from new users maybe less important. We should be able to provide another weight: Double information in the LabeledPoint to weight them differently in the learning algorithm. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7685) Handle high imbalanced data and apply weights to different samples in Logistic Regression
[ https://issues.apache.org/jira/browse/SPARK-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7685: --- Assignee: DB Tsai (was: Apache Spark) Handle high imbalanced data and apply weights to different samples in Logistic Regression - Key: SPARK-7685 URL: https://issues.apache.org/jira/browse/SPARK-7685 Project: Spark Issue Type: New Feature Components: ML Reporter: DB Tsai Assignee: DB Tsai Priority: Critical In fraud detection dataset, almost all the samples are negative while only couple of them are positive. This type of high imbalanced data will bias the models toward negative resulting poor performance. In python-scikit, they provide a correction allowing users to Over-/undersample the samples of each class according to the given weights. In auto mode, selects weights inversely proportional to class frequencies in the training set. This can be done in a more efficient way by multiplying the weights into loss and gradient instead of doing actual over/undersampling in the training dataset which is very expensive. http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html On the other hand, some of the training data maybe more important like the training samples from tenure users while the training samples from new users maybe less important. We should be able to provide another weight: Double information in the LabeledPoint to weight them differently in the learning algorithm. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8505) Add settings to kick `lint-r` from `./dev/run-test.py`
[ https://issues.apache.org/jira/browse/SPARK-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8505: --- Assignee: (was: Apache Spark) Add settings to kick `lint-r` from `./dev/run-test.py` -- Key: SPARK-8505 URL: https://issues.apache.org/jira/browse/SPARK-8505 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add some settings to kick `lint-r` script from `./dev/run-test.py` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9319) Add support for setting column names, types
[ https://issues.apache.org/jira/browse/SPARK-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651367#comment-14651367 ] Hossein Falaki commented on SPARK-9319: --- Yes. I will submit a PR. Add support for setting column names, types --- Key: SPARK-9319 URL: https://issues.apache.org/jira/browse/SPARK-9319 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Shivaram Venkataraman This will help us support functions of the form {code} colnames(data) - c(“Date”, “Arrival_Delay”) coltypes(data) - c(“numeric”, “logical”, “character”) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9550) Configuration renaming, defaults changes, and deprecation for 1.5.0 (master ticket)
Josh Rosen created SPARK-9550: - Summary: Configuration renaming, defaults changes, and deprecation for 1.5.0 (master ticket) Key: SPARK-9550 URL: https://issues.apache.org/jira/browse/SPARK-9550 Project: Spark Issue Type: Task Components: Spark Core, SQL Affects Versions: 1.5.0 Reporter: Josh Rosen Priority: Blocker This ticket tracks configurations which need to be renamed, deprecated, or have their defaults changed for Spark 1.5.0. Note that subtasks / comments here do not necessarily need to reflect changes that must be performed. Rather, tasks should be added here to make sure that the relevant configurations are at least checked before we cut releases. This ticket will also help us to track configuration changes which must make it into the release notes. *Configuration renaming* - Consider renaming {{spark.shuffle.memoryFraction}} to {{spark.execution.memoryFraction}} ([discussion|https://github.com/apache/spark/pull/7770#discussion-diff-36019144]). - Rename all public-facing uses of {{unsafe}} to something less scary, such as {{tungsten}} *Defaults changes* - Codegen is now enabled by default. - Tungsten is now enabled by default. *Deprecation* - Local execution has been removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8939) YARN EC2 default setting fails with IllegalArgumentException
[ https://issues.apache.org/jira/browse/SPARK-8939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651436#comment-14651436 ] Shivaram Venkataraman commented on SPARK-8939: -- [~andrewor14] I ran into this again today -- do you know where should we make a fix for this ? Is this in the Spark source code or can we just make a config option change in the EC2 scripts ? YARN EC2 default setting fails with IllegalArgumentException Key: SPARK-8939 URL: https://issues.apache.org/jira/browse/SPARK-8939 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 1.5.0 Reporter: Andrew Or I just set it up from scratch using the spark-ec2 script. Then I ran {code} bin/spark-shell --master yarn {code} which failed with {code} 15/07/09 03:44:29 ERROR SparkContext: Error initializing SparkContext. java.lang.IllegalArgumentException: Unknown/unsupported param List(--num-executors, , --executor-memory, 6154m, --executor-memory, 6154m, --executor-cores, 2, --name, Spark shell) {code} This goes away if I provide `--num-executors`, but we should fix the default. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9208) Audit DataFrame expression API for 1.5 release
[ https://issues.apache.org/jira/browse/SPARK-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650626#comment-14650626 ] Apache Spark commented on SPARK-9208: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/7861 Audit DataFrame expression API for 1.5 release -- Key: SPARK-9208 URL: https://issues.apache.org/jira/browse/SPARK-9208 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Priority: Blocker This ticket makes sure I go through all new APIs added and audit them before 1.5.0 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9498) Some statistical information missed when the driver is out of the cluster
[ https://issues.apache.org/jira/browse/SPARK-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-9498. -- Resolution: Not A Problem Some statistical information missed when the driver is out of the cluster - Key: SPARK-9498 URL: https://issues.apache.org/jira/browse/SPARK-9498 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 1.3.1, 1.4.0 Reporter: Liang Lee When an application is submited and the driver is out of the Spark cluster, Some statistical information missed sometimes. In stage detail inforamtion page, it will display following info when the driver is in the spark cluster: Details for Stage 7 Total task time across all tasks: 37 min Input Size / Records: 55.8 GB / 60488 Shuffle write: 26.6 GB / 585242962 But when the dreive is out of the spark cluster, it will sometimes display above info, while sometimes not, just like this: Details for Stage 7 Total task time across all tasks: 37 min That is the Input Size and Shuffle data does not display. I have check the code and find that when the input size is zero then it will not display. And the input size is sent by each Executors and collected by Driver. The problem is that the data that should be repored by Executors, is missed. But I don't know why. Could anyone help to solve this problem? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9535) Modify document for codegen
[ https://issues.apache.org/jira/browse/SPARK-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9535: --- Assignee: Apache Spark Modify document for codegen --- Key: SPARK-9535 URL: https://issues.apache.org/jira/browse/SPARK-9535 Project: Spark Issue Type: Improvement Components: Documentation, SQL Affects Versions: 1.5.0 Reporter: Kousuke Saruta Assignee: Apache Spark Priority: Minor SPARK-7184 made codegen enabled by default so let's modify the corresponding documents. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9535) Modify document for codegen
[ https://issues.apache.org/jira/browse/SPARK-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650638#comment-14650638 ] Apache Spark commented on SPARK-9535: - User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/7863 Modify document for codegen --- Key: SPARK-9535 URL: https://issues.apache.org/jira/browse/SPARK-9535 Project: Spark Issue Type: Improvement Components: Documentation, SQL Affects Versions: 1.5.0 Reporter: Kousuke Saruta Priority: Minor SPARK-7184 made codegen enabled by default so let's modify the corresponding documents. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9535) Modify document for codegen
[ https://issues.apache.org/jira/browse/SPARK-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9535: --- Assignee: (was: Apache Spark) Modify document for codegen --- Key: SPARK-9535 URL: https://issues.apache.org/jira/browse/SPARK-9535 Project: Spark Issue Type: Improvement Components: Documentation, SQL Affects Versions: 1.5.0 Reporter: Kousuke Saruta Priority: Minor SPARK-7184 made codegen enabled by default so let's modify the corresponding documents. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9537) DecisionTreeClassifierModel support probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-9537: --- Priority: Minor (was: Major) DecisionTreeClassifierModel support probability prediction for PySpark.ml - Key: SPARK-9537 URL: https://issues.apache.org/jira/browse/SPARK-9537 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Priority: Minor DecisionTreeClassifierModel support probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9537) DecisionTreeClassifierModel support probability prediction for PySpark.ml
Yanbo Liang created SPARK-9537: -- Summary: DecisionTreeClassifierModel support probability prediction for PySpark.ml Key: SPARK-9537 URL: https://issues.apache.org/jira/browse/SPARK-9537 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang DecisionTreeClassifierModel support probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9536) NaiveBayesModel support probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-9536: --- Priority: Minor (was: Major) NaiveBayesModel support probability prediction for PySpark.ml - Key: SPARK-9536 URL: https://issues.apache.org/jira/browse/SPARK-9536 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Priority: Minor NaiveBayesModel support probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9000) Support generic item type in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-9000: - Assignee: Feynman Liang Support generic item type in PrefixSpan --- Key: SPARK-9000 URL: https://issues.apache.org/jira/browse/SPARK-9000 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Feynman Liang Priority: Critical Fix For: 1.5.0 In SPARK-6487, we only support Int type. It requires users to encode other types into integer to use PrefixSpan. We should be able to do this inside PrefixSpan, similar to FPGrowth. This should be done before 1.5 since it changes APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9000) Support generic item type in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-9000. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7837 [https://github.com/apache/spark/pull/7837] Support generic item type in PrefixSpan --- Key: SPARK-9000 URL: https://issues.apache.org/jira/browse/SPARK-9000 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Priority: Critical Fix For: 1.5.0 In SPARK-6487, we only support Int type. It requires users to encode other types into integer to use PrefixSpan. We should be able to do this inside PrefixSpan, similar to FPGrowth. This should be done before 1.5 since it changes APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9370) Support DecimalType in UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-9370. --- Resolution: Fixed Fix Version/s: 1.5.0 Support DecimalType in UnsafeRow Key: SPARK-9370 URL: https://issues.apache.org/jira/browse/SPARK-9370 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Davies Liu Fix For: 1.5.0 We should be able to represent the Decimal data using 2 longs (16 byte) given we no longer support unlimited precision. Once we figure out how to convert Decimal into 2 longs, we can add support for it similar to the way we add support for IntervalType (SPARK-9369). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9370) Support DecimalType in UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650621#comment-14650621 ] Davies Liu commented on SPARK-9370: --- This is fixed by https://github.com/apache/spark/pull/7758 Support DecimalType in UnsafeRow Key: SPARK-9370 URL: https://issues.apache.org/jira/browse/SPARK-9370 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Davies Liu Fix For: 1.5.0 We should be able to represent the Decimal data using 2 longs (16 byte) given we no longer support unlimited precision. Once we figure out how to convert Decimal into 2 longs, we can add support for it similar to the way we add support for IntervalType (SPARK-9369). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7497) test_count_by_value_and_window is flaky
[ https://issues.apache.org/jira/browse/SPARK-7497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-7497: -- Assignee: (was: Davies Liu) test_count_by_value_and_window is flaky --- Key: SPARK-7497 URL: https://issues.apache.org/jira/browse/SPARK-7497 Project: Spark Issue Type: Bug Components: PySpark, Streaming Affects Versions: 1.4.0 Reporter: Xiangrui Meng Priority: Critical Labels: flaky-test Saw this test failure in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32268/console {code} == FAIL: test_count_by_value_and_window (__main__.WindowFunctionTests) -- Traceback (most recent call last): File pyspark/streaming/tests.py, line 418, in test_count_by_value_and_window self._test_func(input, func, expected) File pyspark/streaming/tests.py, line 133, in _test_func self.assertEqual(expected, result) AssertionError: Lists differ: [[1], [2], [3], [4], [5], [6], [6], [6], [6], [6]] != [[1], [2], [3], [4], [5], [6], [6], [6]] First list contains 2 additional elements. First extra element 8: [6] - [[1], [2], [3], [4], [5], [6], [6], [6], [6], [6]] ? -- + [[1], [2], [3], [4], [5], [6], [6], [6]] -- {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9441) NoSuchMethodError: Com.typesafe.config.Config.getDuration
[ https://issues.apache.org/jira/browse/SPARK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-9441. -- Resolution: Not A Problem NoSuchMethodError: Com.typesafe.config.Config.getDuration - Key: SPARK-9441 URL: https://issues.apache.org/jira/browse/SPARK-9441 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.3.1 Reporter: nirav patel I recently migrated my spark based rest service from 1.0.2 to 1.3.1 15/07/29 10:31:12 INFO spark.SparkContext: Running Spark version 1.3.1 15/07/29 10:31:12 INFO spark.SecurityManager: Changing view acls to: npatel 15/07/29 10:31:12 INFO spark.SecurityManager: Changing modify acls to: npatel 15/07/29 10:31:12 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(npatel); users with modify permissions: Set(npatel) Exception in thread main java.lang.NoSuchMethodError: com.typesafe.config.Config.getDuration(Ljava/lang/String;Ljava/util/concurrent/TimeUnit;)J at akka.util.Helpers$ConfigOps$.akka$util$Helpers$ConfigOps$$getDuration$extension(Helpers.scala:125) at akka.util.Helpers$ConfigOps$.getMillisDuration$extension(Helpers.scala:120) at akka.actor.ActorSystem$Settings.init(ActorSystem.scala:171) at akka.actor.ActorSystemImpl.init(ActorSystem.scala:504) at akka.actor.ActorSystem$.apply(ActorSystem.scala:141) at akka.actor.ActorSystem$.apply(ActorSystem.scala:118) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:55) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1837) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1828) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:57) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:223) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:269) at org.apache.spark.SparkContext.init(SparkContext.scala:272) I read on blogs where people suggest to modify classpath and put right version before, put scala libs before in classpath and similar suggestions. which is all ridiculous. I think typesafe config package included with spark-core lib is incorrect. I did following with my maven build and now it works. But i think someone need to fix spark-core package. dependency groupIdorg.apache.spark/groupId artifactIdspark-core_2.10/artifactId exclusions exclusion artifactIdconfig/artifactId groupIdcom.typesafe/groupId /exclusion /exclusions /dependency dependency groupIdcom.typesafe/groupId artifactIdconfig/artifactId version1.2.1/version /dependency -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8889) showDagViz will cause java.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/SPARK-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8889: - Target Version/s: (was: 1.4.2, 1.5.0) Priority: Minor (was: Major) Fix Version/s: (was: 1.4.2) showDagViz will cause java.lang.OutOfMemoryError: Java heap space - Key: SPARK-8889 URL: https://issues.apache.org/jira/browse/SPARK-8889 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.4.0 Environment: Spark 1.4.0 Hadoop 2.2.0 Reporter: cen yuhai Priority: Minor HTTP ERROR 500 Problem accessing /history/app-20150708101140-0018/jobs/job/. Reason: Server Error Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415) at java.lang.StringBuilder.append(StringBuilder.java:132) at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:207) at org.apache.spark.ui.scope.RDDOperationGraph$$anonfun$org$apache$spark$ui$scope$RDDOperationGraph$$makeDotSubgraph$2.apply(RDDOperationGraph.scala:192) at org.apache.spark.ui.scope.RDDOperationGraph$$anonfun$org$apache$spark$ui$scope$RDDOperationGraph$$makeDotSubgraph$2.apply(RDDOperationGraph.scala:191) at scala.collection.immutable.Stream.foreach(Stream.scala:547) at org.apache.spark.ui.scope.RDDOperationGraph$.org$apache$spark$ui$scope$RDDOperationGraph$$makeDotSubgraph(RDDOperationGraph.scala:191) at org.apache.spark.ui.scope.RDDOperationGraph$.makeDotFile(RDDOperationGraph.scala:170) at org.apache.spark.ui.UIUtils$$anonfun$showDagViz$1.apply(UIUtils.scala:361) at org.apache.spark.ui.UIUtils$$anonfun$showDagViz$1.apply(UIUtils.scala:357) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.ui.UIUtils$.showDagViz(UIUtils.scala:357) at org.apache.spark.ui.UIUtils$.showDagVizForJob(UIUtils.scala:335) at org.apache.spark.ui.jobs.JobPage.render(JobPage.scala:317) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:79) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:79) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:69) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.spark-project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.spark-project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.spark-project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-9099) spark-ec2 does not add important ports to security group
[ https://issues.apache.org/jira/browse/SPARK-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Sung-jin Hong closed SPARK-9099. -- Resolution: Invalid spark-ec2 does not add important ports to security group Key: SPARK-9099 URL: https://issues.apache.org/jira/browse/SPARK-9099 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 1.4.0, 1.4.1 Reporter: Brian Sung-jin Hong Priority: Minor spark-ec2 scripts misses to add some few important ports to the security group, including: Master 6066: Needed to submit jobs outside of the cluster Slave 4040: Needed to view worker state Slave 8082: Needed to view some worker logs -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9535) Modify document for codegen
[ https://issues.apache.org/jira/browse/SPARK-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-9535: - Assignee: KaiXinXIaoLei Modify document for codegen --- Key: SPARK-9535 URL: https://issues.apache.org/jira/browse/SPARK-9535 Project: Spark Issue Type: Improvement Components: Documentation, SQL Affects Versions: 1.5.0 Reporter: Kousuke Saruta Assignee: KaiXinXIaoLei Priority: Minor SPARK-7184 made codegen enabled by default so let's modify the corresponding documents. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9536) NaiveBayesModel support probability prediction for PySpark.ml
Yanbo Liang created SPARK-9536: -- Summary: NaiveBayesModel support probability prediction for PySpark.ml Key: SPARK-9536 URL: https://issues.apache.org/jira/browse/SPARK-9536 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang NaiveBayesModel support probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9534) Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition
Sean Owen created SPARK-9534: Summary: Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition Key: SPARK-9534 URL: https://issues.apache.org/jira/browse/SPARK-9534 Project: Spark Issue Type: Improvement Components: Build Reporter: Sean Owen Assignee: Sean Owen Priority: Minor For parity with the kinds of warnings scalac emits, we should turn on some of javac's lint options. This reports, for example use of deprecated APIs and unchecked casts as scalac does. And it's a good time to sweep through build warnings and fix a bunch before the release. PR coming which shows and explains the fixes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9149) Add an example of spark.ml KMeans
[ https://issues.apache.org/jira/browse/SPARK-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-9149. -- Resolution: Fixed Issue resolved by pull request 7697 [https://github.com/apache/spark/pull/7697] Add an example of spark.ml KMeans - Key: SPARK-9149 URL: https://issues.apache.org/jira/browse/SPARK-9149 Project: Spark Issue Type: Documentation Components: Examples, ML Reporter: Yu Ishikawa Assignee: Yu Ishikawa Fix For: 1.5.0 Create an example of KMeans API for spark.ml. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9149) Add an example of spark.ml KMeans
[ https://issues.apache.org/jira/browse/SPARK-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-9149: - Priority: Minor (was: Major) Add an example of spark.ml KMeans - Key: SPARK-9149 URL: https://issues.apache.org/jira/browse/SPARK-9149 Project: Spark Issue Type: Documentation Components: Examples, ML Reporter: Yu Ishikawa Assignee: Yu Ishikawa Priority: Minor Fix For: 1.5.0 Create an example of KMeans API for spark.ml. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4454) Race condition in DAGScheduler
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4454. -- Resolution: Fixed Given the unlikelihood of a further 1.2.x release, I'm closing this as no longer needing a back port Race condition in DAGScheduler -- Key: SPARK-4454 URL: https://issues.apache.org/jira/browse/SPARK-4454 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.1.0 Reporter: Rafal Kwasny Assignee: Josh Rosen Priority: Critical Fix For: 1.3.0 It seems to be a race condition in DAGScheduler that manifests on jobs with high concurrency: {noformat} Exception in thread main java.util.NoSuchElementException: key not found: 35 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.mutable.HashMap.apply(HashMap.scala:64) at org.apache.spark.scheduler.DAGScheduler.getCacheLocs(DAGScheduler.scala:201) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1292) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304) at org.apache.spark.scheduler.DAGScheduler.getPreferredLocs(DAGScheduler.scala:1275) at org.apache.spark.SparkContext.getPreferredLocs(SparkContext.scala:937) at
[jira] [Updated] (SPARK-4454) Race condition in DAGScheduler
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-4454: - Labels: (was: backport-needed) Race condition in DAGScheduler -- Key: SPARK-4454 URL: https://issues.apache.org/jira/browse/SPARK-4454 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.1.0 Reporter: Rafal Kwasny Assignee: Josh Rosen Priority: Critical Fix For: 1.3.0 It seems to be a race condition in DAGScheduler that manifests on jobs with high concurrency: {noformat} Exception in thread main java.util.NoSuchElementException: key not found: 35 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.mutable.HashMap.apply(HashMap.scala:64) at org.apache.spark.scheduler.DAGScheduler.getCacheLocs(DAGScheduler.scala:201) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1292) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304) at org.apache.spark.scheduler.DAGScheduler.getPreferredLocs(DAGScheduler.scala:1275) at org.apache.spark.SparkContext.getPreferredLocs(SparkContext.scala:937) at org.apache.spark.rdd.PartitionCoalescer.currPrefLocs(CoalescedRDD.scala:175)
[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources
[ https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650657#comment-14650657 ] Sean Owen commented on SPARK-8119: -- I attempted a back-port but this depends on SPARK-7835 and possibly other prior changes, which I'm not so familiar with. HeartbeatReceiver should not adjust application executor resources -- Key: SPARK-8119 URL: https://issues.apache.org/jira/browse/SPARK-8119 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.0 Reporter: SaintBacchus Assignee: Andrew Or Priority: Critical Labels: backport-needed Fix For: 1.5.0 DynamicAllocation will set the total executor to a little number when it wants to kill some executors. But in no-DynamicAllocation scenario, Spark will also set the total executor. So it will cause such problem: sometimes an executor fails down, there is no more executor which will be pull up by spark. === EDIT by andrewor14 === The issue is that the AM forgets about the original number of executors it wants after calling sc.killExecutor. Even if dynamic allocation is not enabled, this is still possible because of heartbeat timeouts. I think the problem is that sc.killExecutor is used incorrectly in HeartbeatReceiver. The intention of the method is to permanently adjust the number of executors the application will get. In HeartbeatReceiver, however, this is used as a best-effort mechanism to ensure that the timed out executor is dead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD
[ https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng reassigned SPARK-9527: Assignee: Xiangrui Meng PrefixSpan.run should return a PrefixSpanModel instead of an RDD Key: SPARK-9527 URL: https://issues.apache.org/jira/browse/SPARK-9527 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Critical With a model wrapping the result RDD, it would be more flexible to add features in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8874) Add missing methods in Word2Vec ML
[ https://issues.apache.org/jira/browse/SPARK-8874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650619#comment-14650619 ] Manoj Kumar commented on SPARK-8874: Done. Thanks. Add missing methods in Word2Vec ML -- Key: SPARK-8874 URL: https://issues.apache.org/jira/browse/SPARK-8874 Project: Spark Issue Type: New Feature Components: ML, PySpark Reporter: Manoj Kumar Assignee: Manoj Kumar Add getVectors and findSynonyms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9529) Improve sort on Decimal
[ https://issues.apache.org/jira/browse/SPARK-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9529. Resolution: Fixed Fix Version/s: 1.5.0 Improve sort on Decimal --- Key: SPARK-9529 URL: https://issues.apache.org/jira/browse/SPARK-9529 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Davies Liu Assignee: Davies Liu Priority: Critical Fix For: 1.5.0 Right now, it's really slow, just hang there in random tests {code} pool-1-thread-1-ScalaTest-running-TungstenSortSuite prio=5 tid=0x7f822bc82800 nid=0x5103 runnable [0x00011d1be000] java.lang.Thread.State: RUNNABLE at java.math.BigInteger.init(BigInteger.java:405) at java.math.BigDecimal.bigTenToThe(BigDecimal.java:3380) at java.math.BigDecimal.bigMultiplyPowerTen(BigDecimal.java:3508) at java.math.BigDecimal.setScale(BigDecimal.java:2394) at java.math.BigDecimal.divide(BigDecimal.java:1691) at java.math.BigDecimal.divideToIntegralValue(BigDecimal.java:1734) at java.math.BigDecimal.divideAndRemainder(BigDecimal.java:1891) at java.math.BigDecimal.remainder(BigDecimal.java:1833) at scala.math.BigDecimal.remainder(BigDecimal.scala:281) at scala.math.BigDecimal.isWhole(BigDecimal.scala:215) at scala.math.BigDecimal.hashCode(BigDecimal.scala:180) at org.apache.spark.sql.types.Decimal.hashCode(Decimal.scala:260) at org.apache.spark.sql.catalyst.InternalRow.hashCode(InternalRow.scala:121) at org.apache.spark.RangePartitioner.hashCode(Partitioner.scala:201) at java.lang.Object.toString(Object.java:237) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1418) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at org.apache.spark.SparkContext.clean(SparkContext.scala:2003) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:683) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:682) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:682) at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:181) at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:148) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:148) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112) at org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48) at org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) at org.apache.spark.sql.execution.Sort.doExecute(sort.scala:47) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112) at
[jira] [Resolved] (SPARK-8612) Yarn application status is misreported for failed PySpark apps.
[ https://issues.apache.org/jira/browse/SPARK-8612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-8612. -- Resolution: Duplicate I believe so. I think Marcelo is following up on this general issue; there are a few tickets. Yarn application status is misreported for failed PySpark apps. --- Key: SPARK-8612 URL: https://issues.apache.org/jira/browse/SPARK-8612 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.0, 1.3.1, 1.4.0 Environment: PySpark job run in yarn-client mode on CDH 5.4.2 Reporter: Juliet Hougland Priority: Minor When a PySpark job fails, the YARN records and reports its status as successful. Hari Shreedharan pointed out to me that [the ApplicationMaster records app success when system.exit is called. | https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L124] PySpark always [exits by calling os._exit. | https://github.com/apache/spark/blob/master/python/pyspark/daemon.py#L169] Because of this, every PySpark application run on yarn is marked as completed successfully. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9521) Require Maven 3.3.3+ in the build
[ https://issues.apache.org/jira/browse/SPARK-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-9521. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7852 [https://github.com/apache/spark/pull/7852] Require Maven 3.3.3+ in the build - Key: SPARK-9521 URL: https://issues.apache.org/jira/browse/SPARK-9521 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 1.4.1 Reporter: Sean Owen Assignee: Sean Owen Priority: Trivial Fix For: 1.5.0 Patrick recently discovered a build problem that manifested because he was using the Maven 3.2.x installed on his system, and which was resolved by using Maven 3.3.x. Since we have a script that can install Maven 3.3.3 for anyone, it probably makes sense to just enforce use of Maven 3.3.3+ in the build. (Currently it's just 3.0.4+). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9535) Modify document for codegen
Kousuke Saruta created SPARK-9535: - Summary: Modify document for codegen Key: SPARK-9535 URL: https://issues.apache.org/jira/browse/SPARK-9535 Project: Spark Issue Type: Improvement Components: Documentation, SQL Affects Versions: 1.5.0 Reporter: Kousuke Saruta Priority: Minor SPARK-7184 made codegen enabled by default so let's modify the corresponding documents. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9523) Receiver for Spark Streaming does not naturally support kryo serializer
[ https://issues.apache.org/jira/browse/SPARK-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-9523: - Target Version/s: (was: 1.3.1) Priority: Minor (was: Major) Fix Version/s: (was: 1.4.2) (was: 1.3.2) [~fish748] Please read https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark The fields on this JIRA can't be right... 1.3.1 was released. Fix version doesn't apply to unresolved JIRAs. etc. Receiver for Spark Streaming does not naturally support kryo serializer --- Key: SPARK-9523 URL: https://issues.apache.org/jira/browse/SPARK-9523 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.3.1 Environment: Windows 7 local mode Reporter: John Chen Priority: Minor Labels: kryo, serialization Original Estimate: 120h Remaining Estimate: 120h In some cases, some attributes in a class is not serializable, which you still want to use after serialization of the whole object, you'll have to customize your serialization codes. For example, you can declare those attributes as transient, which makes them ignored during serialization, and then you can reassign their values during deserialization. Now, if you're using Java serialization, you'll have to implement Serializable, and write those codes in readObject() and writeObejct() methods; And if you're using kryo serialization, you'll have to implement KryoSerializable, and write these codes in read() and write() methods. In Spark and Spark Streaming, you can set kryo as the serializer for speeding up. However, the functions taken by RDD or DStream operations are still serialized by Java serialization, which means you only need to write those custom serialization codes in readObject() and writeObejct() methods. But when it comes to Spark Streaming's Receiver, things are different. When you wish to customize an InputDStream, you must extend the Receiver. However, it turns out, the Receiver will be serialized by kryo if you set kryo serializer in SparkConf, and will fall back to Java serialization if you didn't. So here's comes the problems, if you want to change the serializer by configuration and make sure the Receiver runs perfectly for both Java and kryo, you'll have to write all the 4 methods above. First, it is redundant, since you'll have to write serialization/deserialization code almost twice; Secondly, there's nothing in the doc or in the code to inform users to implement the KryoSerializable interface. Since all other function parameters are serialized by Java only, I suggest you also make it so for the Receiver. It may be slower, but since the serialization will only be executed for each interval, it's durable. More importantly, it can cause fewer trouble -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9533) Add missing methods in Word2Vec ML (Python API)
[ https://issues.apache.org/jira/browse/SPARK-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-9533: - Priority: Minor (was: Major) Component/s: ML Add missing methods in Word2Vec ML (Python API) --- Key: SPARK-9533 URL: https://issues.apache.org/jira/browse/SPARK-9533 Project: Spark Issue Type: Improvement Components: ML Reporter: Manoj Kumar Priority: Minor After 8874 is resolved, we can add python wrappers for the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9504) Flaky test: o.a.s.streaming.StreamingContextSuite.stop gracefully
[ https://issues.apache.org/jira/browse/SPARK-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-9504: - Assignee: Shixiong Zhu Flaky test: o.a.s.streaming.StreamingContextSuite.stop gracefully - Key: SPARK-9504 URL: https://issues.apache.org/jira/browse/SPARK-9504 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Shixiong Zhu Assignee: Shixiong Zhu Labels: flaky-test Fix For: 1.5.0 Failure build: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39149/ {code} [info] - stop gracefully *** FAILED *** (3 seconds, 522 milliseconds) [info] 0 was not greater than 0 (StreamingContextSuite.scala:277) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) [info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) [info] at org.apache.spark.streaming.StreamingContextSuite$$anonfun$21$$anonfun$apply$mcV$sp$3.apply$mcVI$sp(StreamingContextSuite.scala:277) [info] at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) [info] at org.apache.spark.streaming.StreamingContextSuite$$anonfun$21.apply$mcV$sp(StreamingContextSuite.scala:261) [info] at org.apache.spark.streaming.StreamingContextSuite$$anonfun$21.apply(StreamingContextSuite.scala:257) [info] at org.apache.spark.streaming.StreamingContextSuite$$anonfun$21.apply(StreamingContextSuite.scala:257) [info] at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) [info] at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) [info] at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) [info] at org.apache.spark.streaming.StreamingContextSuite.org$scalatest$BeforeAndAfter$$super$runTest(StreamingContextSuite.scala:42) [info] at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) [info] at org.apache.spark.streaming.StreamingContextSuite.runTest(StreamingContextSuite.scala:42) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) [info] at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) [info] at org.scalatest.Suite$class.run(Suite.scala:1424) [info] at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:545) [info] at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) [info] at org.apache.spark.streaming.StreamingContextSuite.org$scalatest$BeforeAndAfter$$super$run(StreamingContextSuite.scala:42) [info] at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) [info] at org.apache.spark.streaming.StreamingContextSuite.run(StreamingContextSuite.scala:42) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:294) [info] at
[jira] [Commented] (SPARK-9375) The total number of executor(s) requested by the driver may be negative
[ https://issues.apache.org/jira/browse/SPARK-9375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650650#comment-14650650 ] Sean Owen commented on SPARK-9375: -- [~sandyr] has a question for you on the PR; this may have indeed been resolved by other changes. Can you clarify what version you are running? and explain why you think it's different? The total number of executor(s) requested by the driver may be negative - Key: SPARK-9375 URL: https://issues.apache.org/jira/browse/SPARK-9375 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.1 Reporter: KaiXinXIaoLei Attachments: The total number of executor(s) is negative in AM log.png I set spark.dynamicAllocation.enabled = true”. I run a big job. I find a problem in ApplicationMaster log: the total number of executor(s) requested by the driver is negative. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8981) Set applicationId and appName in log4j MDC
[ https://issues.apache.org/jira/browse/SPARK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-8981. -- Resolution: Won't Fix We can reopen this if there is a PR to clarify how this would work Set applicationId and appName in log4j MDC -- Key: SPARK-8981 URL: https://issues.apache.org/jira/browse/SPARK-8981 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Paweł Kopiczko Priority: Minor It would be nice to have, because it's good to have logs in one file when using log agents (like logentires) in standalone mode. Also allows configuring rolling file appender without a mess when multiple applications are running. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver
[ https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650652#comment-14650652 ] Apache Spark commented on SPARK-7563: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/7865 OutputCommitCoordinator.stop() should only be executed in driver Key: SPARK-7563 URL: https://issues.apache.org/jira/browse/SPARK-7563 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo) Spark 1.3.1 Release Reporter: Hailong Wen Priority: Critical Labels: backport-needed Fix For: 1.4.0 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with EGO (a resource management product). In EGO we uses fine-grained dynamic allocation policy, and each Executor will exit after its tasks are all done. When testing *spark-shell*, we find that when executor of first job exit, it will stop OutputCommitCoordinator, which result in all future jobs failing. Details are as follows: We got the following error in executor when submitting job in *spark-shell* the second time (the first job submission is successful): {noformat} 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to OutputCommitCoordinator: akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator Exception in thread main akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), Path(/user/OutputCommitCoordinator)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {noformat} And in driver side, we see a log message telling that the OutputCommitCoordinator is stopped after the first submission: {noformat} 15/05/11 04:01:23 INFO spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: OutputCommitCoordinator stopped! {noformat} We examine the code of OutputCommitCoordinator, and find that executor will reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor exits, it will eventually call SparkEnv.stop(): {noformat} private[spark] def stop() { isStopped = true pythonWorkers.foreach { case(key, worker) = worker.stop() } Option(httpFileServer).foreach(_.stop()) mapOutputTracker.stop() shuffleManager.stop() broadcastManager.stop()
[jira] [Created] (SPARK-9533) Add missing methods in Word2Vec ML (Python API)
Manoj Kumar created SPARK-9533: -- Summary: Add missing methods in Word2Vec ML (Python API) Key: SPARK-9533 URL: https://issues.apache.org/jira/browse/SPARK-9533 Project: Spark Issue Type: Improvement Reporter: Manoj Kumar After 8874 is resolved, we can add python wrappers for the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9534) Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition
[ https://issues.apache.org/jira/browse/SPARK-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9534: --- Assignee: Sean Owen (was: Apache Spark) Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition --- Key: SPARK-9534 URL: https://issues.apache.org/jira/browse/SPARK-9534 Project: Spark Issue Type: Improvement Components: Build Reporter: Sean Owen Assignee: Sean Owen Priority: Minor For parity with the kinds of warnings scalac emits, we should turn on some of javac's lint options. This reports, for example use of deprecated APIs and unchecked casts as scalac does. And it's a good time to sweep through build warnings and fix a bunch before the release. PR coming which shows and explains the fixes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9534) Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition
[ https://issues.apache.org/jira/browse/SPARK-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650631#comment-14650631 ] Apache Spark commented on SPARK-9534: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/7862 Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition --- Key: SPARK-9534 URL: https://issues.apache.org/jira/browse/SPARK-9534 Project: Spark Issue Type: Improvement Components: Build Reporter: Sean Owen Assignee: Sean Owen Priority: Minor For parity with the kinds of warnings scalac emits, we should turn on some of javac's lint options. This reports, for example use of deprecated APIs and unchecked casts as scalac does. And it's a good time to sweep through build warnings and fix a bunch before the release. PR coming which shows and explains the fixes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9534) Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition
[ https://issues.apache.org/jira/browse/SPARK-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9534: --- Assignee: Apache Spark (was: Sean Owen) Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition --- Key: SPARK-9534 URL: https://issues.apache.org/jira/browse/SPARK-9534 Project: Spark Issue Type: Improvement Components: Build Reporter: Sean Owen Assignee: Apache Spark Priority: Minor For parity with the kinds of warnings scalac emits, we should turn on some of javac's lint options. This reports, for example use of deprecated APIs and unchecked casts as scalac does. And it's a good time to sweep through build warnings and fix a bunch before the release. PR coming which shows and explains the fixes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9209) Using executor allocation, a executor is removed but it exists in ExecutorsPage of the web ui
[ https://issues.apache.org/jira/browse/SPARK-9209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-9209: - Target Version/s: (was: 1.5.0) Priority: Minor (was: Major) Fix Version/s: (was: 1.5.0) Using executor allocation, a executor is removed but it exists in ExecutorsPage of the web ui -- Key: SPARK-9209 URL: https://issues.apache.org/jira/browse/SPARK-9209 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.4.1 Reporter: KaiXinXIaoLei Priority: Minor Attachments: A Executor exists in web.png, executor is removed.png I set spark.dynamicAllocation.enabled = true”, and run a big job. In driver, a executor is asked to remove, and it's remove successfully, and the process of this executor is not exist. But it exists in ExecutorsPage of the web ui. The log in driver : 2015-07-17 11:48:14,543 | INFO | [sparkDriver-akka.actor.default-dispatcher-3] | Removing block manager BlockManagerId(264, 172.1.1.8, 23811) 2015-07-17 11:48:14,543 | INFO | [dag-scheduler-event-loop] | Removed 264 successfully in removeExecutor 2015-07-17 11:48:21,226 | INFO | [sparkDriver-akka.actor.default-dispatcher-3] | Registering block manager 172.1.1.8:23811 with 10.4 GB RAM, BlockManagerId(264, 172.1.1.8, 23811) 2015-07-17 11:48:21,228 | INFO | [sparkDriver-akka.actor.default-dispatcher-3] | Added broadcast_781_piece0 in memory on 172.1.1.8:23811 (size: 38.6 KB, free: 10.4 GB) 2015-07-17 11:48:35,277 | ERROR | [sparkDriver-akka.actor.default-dispatcher-16] | Lost executor 264 on datasight-195: remote Rpc client disassociated 2015-07-17 11:48:35,277 | WARN | [sparkDriver-akka.actor.default-dispatcher-4] | Association with remote system [akka.tcp://sparkExecutor@datasight-195:23929] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 2015-07-17 11:48:35,277 | INFO | [sparkDriver-akka.actor.default-dispatcher-16] | Re-queueing tasks for 264 from TaskSet 415.0 2015-07-17 11:48:35,804 | INFO | [SparkListenerBus] | Existing executor 264 has been removed (new total is 10) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9056) Rename configuration `spark.streaming.minRememberDuration` to `spark.streaming.fileStream.minRememberDuration`
[ https://issues.apache.org/jira/browse/SPARK-9056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-9056: - Assignee: Sameer Abhyankar Rename configuration `spark.streaming.minRememberDuration` to `spark.streaming.fileStream.minRememberDuration` -- Key: SPARK-9056 URL: https://issues.apache.org/jira/browse/SPARK-9056 Project: Spark Issue Type: Sub-task Components: Streaming Affects Versions: 1.4.1 Reporter: Tathagata Das Assignee: Sameer Abhyankar Priority: Trivial Labels: starter Fix For: 1.5.0 spark.streaming.minRememberDuration is confusing as it is not immediately evident what this configuration is about. Best to rename it to spark.streaming.fileStream.minRememberDuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9538) LogisticRegression support raw and probability prediction for PySpark.ml
Yanbo Liang created SPARK-9538: -- Summary: LogisticRegression support raw and probability prediction for PySpark.ml Key: SPARK-9538 URL: https://issues.apache.org/jira/browse/SPARK-9538 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Priority: Minor LogisticRegression support raw and probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9536) NaiveBayesModel support probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650694#comment-14650694 ] Apache Spark commented on SPARK-9536: - User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/7866 NaiveBayesModel support probability prediction for PySpark.ml - Key: SPARK-9536 URL: https://issues.apache.org/jira/browse/SPARK-9536 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Priority: Minor NaiveBayesModel support probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8064) Upgrade Hive to 1.2
[ https://issues.apache.org/jira/browse/SPARK-8064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650725#comment-14650725 ] Apache Spark commented on SPARK-8064: - User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/7867 Upgrade Hive to 1.2 --- Key: SPARK-8064 URL: https://issues.apache.org/jira/browse/SPARK-8064 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Steve Loughran Priority: Blocker -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7230) Make RDD API private in SparkR for Spark 1.4
[ https://issues.apache.org/jira/browse/SPARK-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651023#comment-14651023 ] Michael Smith commented on SPARK-7230: -- I support Antonio's request to bring back this functionality in version 1.5 so that plyrmr can continue to be used with the Spark backend as before. Make RDD API private in SparkR for Spark 1.4 Key: SPARK-7230 URL: https://issues.apache.org/jira/browse/SPARK-7230 Project: Spark Issue Type: Sub-task Components: SparkR Affects Versions: 1.4.0 Reporter: Shivaram Venkataraman Assignee: Shivaram Venkataraman Priority: Critical Fix For: 1.4.0 This ticket proposes making the RDD API in SparkR private for the 1.4 release. The motivation for doing so are discussed in a larger design document aimed at a more top-down design of the SparkR APIs. A first cut that discusses motivation and proposed changes can be found at http://goo.gl/GLHKZI The main points in that document that relate to this ticket are: - The RDD API requires knowledge of the distributed system and is pretty low level. This is not very suitable for a number of R users who are used to more high-level packages that work out of the box. - The RDD implementation in SparkR is not fully robust right now: we are missing features like spilling for aggregation, handling partitions which don't fit in memory etc. There are further limitations like lack of hashCode for non-native types etc. which might affect user experience. The only change we will make for now is to not export the RDD functions as public methods in the SparkR package and I will create another ticket for discussing more details public API for 1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9254) sbt-launch-lib.bash should use `curl --location` to support HTTP/HTTPS redirection
[ https://issues.apache.org/jira/browse/SPARK-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-9254: - Fix Version/s: 1.3.2 sbt-launch-lib.bash should use `curl --location` to support HTTP/HTTPS redirection -- Key: SPARK-9254 URL: https://issues.apache.org/jira/browse/SPARK-9254 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.5.0 Reporter: Cheng Lian Assignee: Cheng Lian Fix For: 1.3.2, 1.4.2, 1.5.0 The {{curl}} call in the script should use {{--location}} to support HTTP/HTTPS redirection, since target file(s) can be hosted on CDN nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9538) LogisticRegression support raw and probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650696#comment-14650696 ] Apache Spark commented on SPARK-9538: - User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/7866 LogisticRegression support raw and probability prediction for PySpark.ml Key: SPARK-9538 URL: https://issues.apache.org/jira/browse/SPARK-9538 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Priority: Minor LogisticRegression support raw and probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9537) DecisionTreeClassifierModel support probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9537: --- Assignee: (was: Apache Spark) DecisionTreeClassifierModel support probability prediction for PySpark.ml - Key: SPARK-9537 URL: https://issues.apache.org/jira/browse/SPARK-9537 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Priority: Minor DecisionTreeClassifierModel support probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9537) DecisionTreeClassifierModel support probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650695#comment-14650695 ] Apache Spark commented on SPARK-9537: - User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/7866 DecisionTreeClassifierModel support probability prediction for PySpark.ml - Key: SPARK-9537 URL: https://issues.apache.org/jira/browse/SPARK-9537 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Priority: Minor DecisionTreeClassifierModel support probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9538) LogisticRegression support raw and probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9538: --- Assignee: Apache Spark LogisticRegression support raw and probability prediction for PySpark.ml Key: SPARK-9538 URL: https://issues.apache.org/jira/browse/SPARK-9538 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Assignee: Apache Spark Priority: Minor LogisticRegression support raw and probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9538) LogisticRegression support raw and probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9538: --- Assignee: (was: Apache Spark) LogisticRegression support raw and probability prediction for PySpark.ml Key: SPARK-9538 URL: https://issues.apache.org/jira/browse/SPARK-9538 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Priority: Minor LogisticRegression support raw and probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9537) DecisionTreeClassifierModel support probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9537: --- Assignee: Apache Spark DecisionTreeClassifierModel support probability prediction for PySpark.ml - Key: SPARK-9537 URL: https://issues.apache.org/jira/browse/SPARK-9537 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Assignee: Apache Spark Priority: Minor DecisionTreeClassifierModel support probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9536) NaiveBayesModel support probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9536: --- Assignee: (was: Apache Spark) NaiveBayesModel support probability prediction for PySpark.ml - Key: SPARK-9536 URL: https://issues.apache.org/jira/browse/SPARK-9536 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Priority: Minor NaiveBayesModel support probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9536) NaiveBayesModel support probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9536: --- Assignee: Apache Spark NaiveBayesModel support probability prediction for PySpark.ml - Key: SPARK-9536 URL: https://issues.apache.org/jira/browse/SPARK-9536 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Assignee: Apache Spark Priority: Minor NaiveBayesModel support probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD
[ https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9527: --- Assignee: Xiangrui Meng (was: Apache Spark) PrefixSpan.run should return a PrefixSpanModel instead of an RDD Key: SPARK-9527 URL: https://issues.apache.org/jira/browse/SPARK-9527 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Critical With a model wrapping the result RDD, it would be more flexible to add features in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak
[ https://issues.apache.org/jira/browse/SPARK-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651067#comment-14651067 ] Andrey Zimovnov commented on SPARK-9539: Hi, Owen! I'm not sure what Permanent in java heap means, but it grows with time. I really have such a use case, when I need to recreate spark context a lot. The only workaround for now is to try to increase MaxPermSize, I guess. Repeated sc.close() in PySpark causes JVM memory leak - Key: SPARK-9539 URL: https://issues.apache.org/jira/browse/SPARK-9539 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.1 Reporter: Andrey Zimovnov Priority: Minor Attachments: Screenshot at авг. 02 19-10-53.png Example code in Python: {code:python} for i in range(20): print i conf = SparkConf().setAppName(test) sc = SparkContext(conf=conf) hivec = HiveContext(sc) hivec.sql(select id from details_info limit 1).show() sc.stop() del hivec del sc {code} Jstat output: {noformat} S0CS1CS0US1U EC EUOC OU PC PUYGC YGCTFGCFGCT GCT 196608,0 196608,0 97566,2 0,0 1179648,0 542150,0 3145728,0120,0 154112,0 153613,2 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 679041,7 3145728,0120,0 164352,0 164183,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 907928,4 3145728,0120,0 164352,0 164200,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 912132,7 3145728,0120,0 164352,0 164200,5 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 913741,5 3145728,0120,0 164352,0 164200,8 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 929458,6 3145728,0120,0 164352,0 164206,0 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 1003138,1 3145728,0120,0 168960,0 168646,0 40,434 0 0,0000,434 131584,0 196608,0 0,0 109725,6 1179648,0 0,03145728,0128,0 175104,0 174802,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 152654,9 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 158586,1 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 160659,8 3145728,0128,0 175104,0 174805,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 181935,2 3145728,0128,0 175104,0 174819,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 283389,1 3145728,0128,0 185856,0 185371,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 342596,4 3145728,0128,0 185856,0 185379,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 547634,7 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 555930,9 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 557888,6 3145728,0128,0 185856,0 185386,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 573907,5 3145728,0128,0 185856,0 185397,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 637955,0 3145728,0128,0 189952,0 189533,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 895866,1 3145728,0128,0 196096,0 195968,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 948046,5 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 952427,2 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 957977,5 3145728,0128,0 196096,0 195973,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 977811,1 3145728,0128,0 196096,0 195977,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 1118722,0 3145728,0128,0 206848,0 206539,0 50,591 0 0,0000,591 131584,0 144384,0 118692,5 0,0 1284096,0 183470,8 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 189718,5 3145728,0136,0
[jira] [Commented] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak
[ https://issues.apache.org/jira/browse/SPARK-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651073#comment-14651073 ] Sean Owen commented on SPARK-9539: -- This just shows Spark is using memory. It's normal to use some of the permanent generation. Your jstat dump shows normal growth and GC of the heap. It does not show any out-of-memory condition. It may simply be that you need to increase the memory you allocate, especially the permanent generation (you should probably read up on this). Unless you can point to an actual memory leak from a heap dump, I'd like to close this. Repeated sc.close() in PySpark causes JVM memory leak - Key: SPARK-9539 URL: https://issues.apache.org/jira/browse/SPARK-9539 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.1 Reporter: Andrey Zimovnov Priority: Minor Attachments: Screenshot at авг. 02 19-10-53.png Example code in Python: {code:python} for i in range(20): print i conf = SparkConf().setAppName(test) sc = SparkContext(conf=conf) hivec = HiveContext(sc) hivec.sql(select id from details_info limit 1).show() sc.stop() del hivec del sc {code} Jstat output: {noformat} S0CS1CS0US1U EC EUOC OU PC PUYGC YGCTFGCFGCT GCT 196608,0 196608,0 97566,2 0,0 1179648,0 542150,0 3145728,0120,0 154112,0 153613,2 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 679041,7 3145728,0120,0 164352,0 164183,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 907928,4 3145728,0120,0 164352,0 164200,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 912132,7 3145728,0120,0 164352,0 164200,5 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 913741,5 3145728,0120,0 164352,0 164200,8 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 929458,6 3145728,0120,0 164352,0 164206,0 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 1003138,1 3145728,0120,0 168960,0 168646,0 40,434 0 0,0000,434 131584,0 196608,0 0,0 109725,6 1179648,0 0,03145728,0128,0 175104,0 174802,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 152654,9 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 158586,1 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 160659,8 3145728,0128,0 175104,0 174805,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 181935,2 3145728,0128,0 175104,0 174819,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 283389,1 3145728,0128,0 185856,0 185371,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 342596,4 3145728,0128,0 185856,0 185379,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 547634,7 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 555930,9 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 557888,6 3145728,0128,0 185856,0 185386,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 573907,5 3145728,0128,0 185856,0 185397,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 637955,0 3145728,0128,0 189952,0 189533,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 895866,1 3145728,0128,0 196096,0 195968,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 948046,5 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 952427,2 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 957977,5 3145728,0128,0 196096,0 195973,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 977811,1 3145728,0128,0 196096,0 195977,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 1118722,0 3145728,0128,0 206848,0 206539,0 50,591 0 0,0000,591 131584,0 144384,0
[jira] [Created] (SPARK-9542) create unsafe version of map type
Wenchen Fan created SPARK-9542: -- Summary: create unsafe version of map type Key: SPARK-9542 URL: https://issues.apache.org/jira/browse/SPARK-9542 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9542) create unsafe version of map type
[ https://issues.apache.org/jira/browse/SPARK-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9542: --- Assignee: (was: Apache Spark) create unsafe version of map type - Key: SPARK-9542 URL: https://issues.apache.org/jira/browse/SPARK-9542 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD
[ https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651062#comment-14651062 ] Apache Spark commented on SPARK-9527: - User 'mengxr' has created a pull request for this issue: https://github.com/apache/spark/pull/7869 PrefixSpan.run should return a PrefixSpanModel instead of an RDD Key: SPARK-9527 URL: https://issues.apache.org/jira/browse/SPARK-9527 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Critical With a model wrapping the result RDD, it would be more flexible to add features in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD
[ https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9527: --- Assignee: Apache Spark (was: Xiangrui Meng) PrefixSpan.run should return a PrefixSpanModel instead of an RDD Key: SPARK-9527 URL: https://issues.apache.org/jira/browse/SPARK-9527 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Apache Spark Priority: Critical With a model wrapping the result RDD, it would be more flexible to add features in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak
[ https://issues.apache.org/jira/browse/SPARK-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Zimovnov updated SPARK-9539: --- Description: Example code in Python: {code:python} for i in range(20): print i conf = SparkConf().setAppName(test) sc = SparkContext(conf=conf) hivec = HiveContext(sc) hivec.sql(select id from details_info limit 1).show() sc.stop() del hivec del sc {code} Jstat output: {noformat} S0CS1CS0US1U EC EUOC OU PC PU YGC YGCTFGCFGCT GCT 196608,0 196608,0 97566,2 0,0 1179648,0 542150,0 3145728,0120,0 154112,0 153613,2 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 679041,7 3145728,0120,0 164352,0 164183,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 907928,4 3145728,0120,0 164352,0 164200,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 912132,7 3145728,0120,0 164352,0 164200,5 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 913741,5 3145728,0120,0 164352,0 164200,8 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 929458,6 3145728,0120,0 164352,0 164206,0 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 1003138,1 3145728,0120,0 168960,0 168646,0 40,434 0 0,0000,434 131584,0 196608,0 0,0 109725,6 1179648,0 0,03145728,0128,0 175104,0 174802,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 152654,9 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 158586,1 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 160659,8 3145728,0128,0 175104,0 174805,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 181935,2 3145728,0128,0 175104,0 174819,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 283389,1 3145728,0128,0 185856,0 185371,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 342596,4 3145728,0128,0 185856,0 185379,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 547634,7 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 555930,9 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 557888,6 3145728,0128,0 185856,0 185386,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 573907,5 3145728,0128,0 185856,0 185397,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 637955,0 3145728,0128,0 189952,0 189533,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 895866,1 3145728,0128,0 196096,0 195968,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 948046,5 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 952427,2 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 957977,5 3145728,0128,0 196096,0 195973,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 977811,1 3145728,0128,0 196096,0 195977,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 1118722,0 3145728,0128,0 206848,0 206539,0 50,591 0 0,0000,591 131584,0 144384,0 118692,5 0,0 1284096,0 183470,8 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 189718,5 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 192165,0 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 199848,4 3145728,0136,0 206848,0 206546,9 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 219687,6 3145728,0136,0 206848,0 206552,2 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 358272,4 3145728,0136,0 217600,0 217100,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 573543,6 3145728,0136,0 217600,0 217109,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0
[jira] [Commented] (SPARK-5754) Spark AM not launching on Windows
[ https://issues.apache.org/jira/browse/SPARK-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651140#comment-14651140 ] Apache Spark commented on SPARK-5754: - User 'cbvoxel' has created a pull request for this issue: https://github.com/apache/spark/pull/7872 Spark AM not launching on Windows - Key: SPARK-5754 URL: https://issues.apache.org/jira/browse/SPARK-5754 Project: Spark Issue Type: Bug Components: Windows, YARN Affects Versions: 1.1.1, 1.2.0 Environment: Windows Server 2012, Hadoop 2.4.1. Reporter: Inigo I'm trying to run Spark Pi on a YARN cluster running on Windows and the AM container fails to start. The problem seems to be in the generation of the YARN command which adds single quotes (') surrounding some of the java options. In particular, the part of the code that is adding those is the escapeForShell function in YarnSparkHadoopUtil. Apparently, Windows does not like the quotes for these options. Here is an example of the command that the container tries to execute: @call %JAVA_HOME%/bin/java -server -Xmx512m -Djava.io.tmpdir=%PWD%/tmp '-Dspark.yarn.secondary.jars=' '-Dspark.app.name=org.apache.spark.examples.SparkPi' '-Dspark.master=yarn-cluster' org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.examples.SparkPi' --jar 'file:/D:/data/spark-1.1.1-bin-hadoop2.4/bin/../lib/spark-examples-1.1.1-hadoop2.4.0.jar' --executor-memory 1024 --executor-cores 1 --num-executors 2 Once I transform it into: @call %JAVA_HOME%/bin/java -server -Xmx512m -Djava.io.tmpdir=%PWD%/tmp -Dspark.yarn.secondary.jars= -Dspark.app.name=org.apache.spark.examples.SparkPi -Dspark.master=yarn-cluster org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.examples.SparkPi' --jar 'file:/D:/data/spark-1.1.1-bin-hadoop2.4/bin/../lib/spark-examples-1.1.1-hadoop2.4.0.jar' --executor-memory 1024 --executor-cores 1 --num-executors 2 Everything seems to start. How should I deal with this? Creating a separate function like escapeForShell for Windows and call it whenever I detect this is for Windows? Or should I add some sanity check on YARN? I checked a little and there seems to be people that is able to run Spark on YARN on Windows, so it might be something else. I didn't find anything related on Jira either. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it should be Java-friendly
[ https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-9527: - Shepherd: Feynman Liang PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it should be Java-friendly --- Key: SPARK-9527 URL: https://issues.apache.org/jira/browse/SPARK-9527 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Critical With a model wrapping the result RDD, it would be more flexible to add features in the future. And it should be Java-friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it should be Java-friendly
[ https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-9527: - Summary: PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it should be Java-friendly (was: PrefixSpan.run should return a PrefixSpanModel instead of an RDD) PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it should be Java-friendly --- Key: SPARK-9527 URL: https://issues.apache.org/jira/browse/SPARK-9527 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Critical With a model wrapping the result RDD, it would be more flexible to add features in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it should be Java-friendly
[ https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-9527: - Description: With a model wrapping the result RDD, it would be more flexible to add features in the future. And it should be Java-friendly. (was: With a model wrapping the result RDD, it would be more flexible to add features in the future.) PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it should be Java-friendly --- Key: SPARK-9527 URL: https://issues.apache.org/jira/browse/SPARK-9527 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Critical With a model wrapping the result RDD, it would be more flexible to add features in the future. And it should be Java-friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak
[ https://issues.apache.org/jira/browse/SPARK-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651064#comment-14651064 ] Sean Owen commented on SPARK-9539: -- Why do you think this is a memory leak? That exception does not even indicate an out-of-memory condition. Repeated sc.close() in PySpark causes JVM memory leak - Key: SPARK-9539 URL: https://issues.apache.org/jira/browse/SPARK-9539 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.1 Reporter: Andrey Zimovnov Priority: Minor Attachments: Screenshot at авг. 02 19-10-53.png Example code in Python: {code:python} for i in range(20): print i conf = SparkConf().setAppName(test) sc = SparkContext(conf=conf) hivec = HiveContext(sc) hivec.sql(select id from details_info limit 1).show() sc.stop() del hivec del sc {code} Jstat output: {noformat} S0CS1CS0US1U EC EUOC OU PC PUYGC YGCTFGCFGCT GCT 196608,0 196608,0 97566,2 0,0 1179648,0 542150,0 3145728,0120,0 154112,0 153613,2 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 679041,7 3145728,0120,0 164352,0 164183,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 907928,4 3145728,0120,0 164352,0 164200,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 912132,7 3145728,0120,0 164352,0 164200,5 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 913741,5 3145728,0120,0 164352,0 164200,8 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 929458,6 3145728,0120,0 164352,0 164206,0 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 1003138,1 3145728,0120,0 168960,0 168646,0 40,434 0 0,0000,434 131584,0 196608,0 0,0 109725,6 1179648,0 0,03145728,0128,0 175104,0 174802,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 152654,9 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 158586,1 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 160659,8 3145728,0128,0 175104,0 174805,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 181935,2 3145728,0128,0 175104,0 174819,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 283389,1 3145728,0128,0 185856,0 185371,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 342596,4 3145728,0128,0 185856,0 185379,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 547634,7 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 555930,9 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 557888,6 3145728,0128,0 185856,0 185386,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 573907,5 3145728,0128,0 185856,0 185397,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 637955,0 3145728,0128,0 189952,0 189533,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 895866,1 3145728,0128,0 196096,0 195968,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 948046,5 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 952427,2 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 957977,5 3145728,0128,0 196096,0 195973,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 977811,1 3145728,0128,0 196096,0 195977,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 1118722,0 3145728,0128,0 206848,0 206539,0 50,591 0 0,0000,591 131584,0 144384,0 118692,5 0,0 1284096,0 183470,8 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 189718,5 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 192165,0 3145728,0136,0
[jira] [Created] (SPARK-9540) Optimize PrefixSpan implementation
Xiangrui Meng created SPARK-9540: Summary: Optimize PrefixSpan implementation Key: SPARK-9540 URL: https://issues.apache.org/jira/browse/SPARK-9540 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Critical Current `PrefixSpan` implementation contains some major issues: 1. We should expand the prefix by one item at a time instead of by one itemset. 2. Some set operations should be changed to array operations, which should be more efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak
[ https://issues.apache.org/jira/browse/SPARK-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Zimovnov updated SPARK-9539: --- Attachment: Screenshot at авг. 02 19-10-53.png jstat visualization Repeated sc.close() in PySpark causes JVM memory leak - Key: SPARK-9539 URL: https://issues.apache.org/jira/browse/SPARK-9539 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.1 Reporter: Andrey Zimovnov Priority: Minor Attachments: Screenshot at авг. 02 19-10-53.png Example code in Python: {code:python} for i in range(20): print i conf = SparkConf().setAppName(test) sc = SparkContext(conf=conf) hivec = HiveContext(sc) hivec.sql(select id from details_info limit 1).show() sc.stop() del hivec del sc {code} Jstat output: {noformat} S0CS1CS0US1U EC EUOC OU PC PUYGC YGCTFGCFGCT GCT 196608,0 196608,0 97566,2 0,0 1179648,0 542150,0 3145728,0120,0 154112,0 153613,2 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 679041,7 3145728,0120,0 164352,0 164183,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 907928,4 3145728,0120,0 164352,0 164200,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 912132,7 3145728,0120,0 164352,0 164200,5 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 913741,5 3145728,0120,0 164352,0 164200,8 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 929458,6 3145728,0120,0 164352,0 164206,0 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 1003138,1 3145728,0120,0 168960,0 168646,0 40,434 0 0,0000,434 131584,0 196608,0 0,0 109725,6 1179648,0 0,03145728,0128,0 175104,0 174802,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 152654,9 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 158586,1 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 160659,8 3145728,0128,0 175104,0 174805,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 181935,2 3145728,0128,0 175104,0 174819,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 283389,1 3145728,0128,0 185856,0 185371,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 342596,4 3145728,0128,0 185856,0 185379,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 547634,7 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 555930,9 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 557888,6 3145728,0128,0 185856,0 185386,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 573907,5 3145728,0128,0 185856,0 185397,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 637955,0 3145728,0128,0 189952,0 189533,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 895866,1 3145728,0128,0 196096,0 195968,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 948046,5 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 952427,2 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 957977,5 3145728,0128,0 196096,0 195973,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 977811,1 3145728,0128,0 196096,0 195977,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 1118722,0 3145728,0128,0 206848,0 206539,0 50,591 0 0,0000,591 131584,0 144384,0 118692,5 0,0 1284096,0 183470,8 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 189718,5 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 192165,0 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0
[jira] [Commented] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak
[ https://issues.apache.org/jira/browse/SPARK-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651078#comment-14651078 ] Andrey Zimovnov commented on SPARK-9539: OK, I'll work on this later and reopen if necessary. Thanks! Repeated sc.close() in PySpark causes JVM memory leak - Key: SPARK-9539 URL: https://issues.apache.org/jira/browse/SPARK-9539 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.1 Reporter: Andrey Zimovnov Priority: Minor Attachments: Screenshot at авг. 02 19-10-53.png Example code in Python: {code:python} for i in range(20): print i conf = SparkConf().setAppName(test) sc = SparkContext(conf=conf) hivec = HiveContext(sc) hivec.sql(select id from details_info limit 1).show() sc.stop() del hivec del sc {code} Jstat output: {noformat} S0CS1CS0US1U EC EUOC OU PC PUYGC YGCTFGCFGCT GCT 196608,0 196608,0 97566,2 0,0 1179648,0 542150,0 3145728,0120,0 154112,0 153613,2 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 679041,7 3145728,0120,0 164352,0 164183,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 907928,4 3145728,0120,0 164352,0 164200,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 912132,7 3145728,0120,0 164352,0 164200,5 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 913741,5 3145728,0120,0 164352,0 164200,8 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 929458,6 3145728,0120,0 164352,0 164206,0 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 1003138,1 3145728,0120,0 168960,0 168646,0 40,434 0 0,0000,434 131584,0 196608,0 0,0 109725,6 1179648,0 0,03145728,0128,0 175104,0 174802,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 152654,9 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 158586,1 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 160659,8 3145728,0128,0 175104,0 174805,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 181935,2 3145728,0128,0 175104,0 174819,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 283389,1 3145728,0128,0 185856,0 185371,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 342596,4 3145728,0128,0 185856,0 185379,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 547634,7 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 555930,9 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 557888,6 3145728,0128,0 185856,0 185386,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 573907,5 3145728,0128,0 185856,0 185397,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 637955,0 3145728,0128,0 189952,0 189533,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 895866,1 3145728,0128,0 196096,0 195968,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 948046,5 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 952427,2 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 957977,5 3145728,0128,0 196096,0 195973,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 977811,1 3145728,0128,0 196096,0 195977,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 1118722,0 3145728,0128,0 206848,0 206539,0 50,591 0 0,0000,591 131584,0 144384,0 118692,5 0,0 1284096,0 183470,8 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 189718,5 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 192165,0 3145728,0136,0 206848,0 206543,4 6
[jira] [Created] (SPARK-9541) DateTimeUtils cleanup
Yijie Shen created SPARK-9541: - Summary: DateTimeUtils cleanup Key: SPARK-9541 URL: https://issues.apache.org/jira/browse/SPARK-9541 Project: Spark Issue Type: Sub-task Reporter: Yijie Shen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak
Andrey Zimovnov created SPARK-9539: -- Summary: Repeated sc.close() in PySpark causes JVM memory leak Key: SPARK-9539 URL: https://issues.apache.org/jira/browse/SPARK-9539 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.1 Reporter: Andrey Zimovnov Priority: Minor Example code in Python: for i in range(20): print i conf = SparkConf().setAppName(test) sc = SparkContext(conf=conf) hivec = HiveContext(sc) hivec.sql(select id from details_info limit 1).show() sc.stop() del hivec del sc Jstat output: S0CS1CS0US1U EC EUOC OU PC PU YGC YGCTFGCFGCT GCT 196608,0 196608,0 97566,2 0,0 1179648,0 542150,0 3145728,0120,0 154112,0 153613,2 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 679041,7 3145728,0120,0 164352,0 164183,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 907928,4 3145728,0120,0 164352,0 164200,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 912132,7 3145728,0120,0 164352,0 164200,5 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 913741,5 3145728,0120,0 164352,0 164200,8 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 929458,6 3145728,0120,0 164352,0 164206,0 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 1003138,1 3145728,0120,0 168960,0 168646,0 40,434 0 0,0000,434 131584,0 196608,0 0,0 109725,6 1179648,0 0,03145728,0128,0 175104,0 174802,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 152654,9 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 158586,1 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 160659,8 3145728,0128,0 175104,0 174805,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 181935,2 3145728,0128,0 175104,0 174819,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 283389,1 3145728,0128,0 185856,0 185371,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 342596,4 3145728,0128,0 185856,0 185379,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 547634,7 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 555930,9 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 557888,6 3145728,0128,0 185856,0 185386,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 573907,5 3145728,0128,0 185856,0 185397,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 637955,0 3145728,0128,0 189952,0 189533,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 895866,1 3145728,0128,0 196096,0 195968,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 948046,5 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 952427,2 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 957977,5 3145728,0128,0 196096,0 195973,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 977811,1 3145728,0128,0 196096,0 195977,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 1118722,0 3145728,0128,0 206848,0 206539,0 50,591 0 0,0000,591 131584,0 144384,0 118692,5 0,0 1284096,0 183470,8 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 189718,5 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 192165,0 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 199848,4 3145728,0136,0 206848,0 206546,9 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 219687,6 3145728,0136,0 206848,0 206552,2 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 358272,4 3145728,0136,0 217600,0 217100,4 60,773 0 0,000
[jira] [Updated] (SPARK-8445) MLlib 1.5 Roadmap
[ https://issues.apache.org/jira/browse/SPARK-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-8445: - Description: We expect to see many MLlib contributors for the 1.5 release. To scale out the development, we created this master list for MLlib features we plan to have in Spark 1.5. Please view this list as a wish list rather than a concrete plan, because we don't have an accurate estimate of available resources. Due to limited review bandwidth, features appearing on this list will get higher priority during code review. But feel free to suggest new items to the list in comments. We are experimenting with this process. Your feedback would be greatly appreciated. h1. Instructions h2. For contributors: * Please read https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark carefully. Code style, documentation, and unit tests are important. * If you are a first-time Spark contributor, please always start with a [starter task|https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20AND%20labels%20%3D%20starter%20AND%20%22Target%20Version%2Fs%22%20%3D%201.5.0] rather than a medium/big feature. Based on our experience, mixing the development process with a big feature usually causes long delay in code review. * Never work silently. Let everyone know on the corresponding JIRA page when you start working on some features. This is to avoid duplicate work. For small features, you don't need to wait to get JIRA assigned. * For medium/big features or features with dependencies, please get assigned first before coding and keep the ETA updated on the JIRA. If there exist no activity on the JIRA page for a certain amount of time, the JIRA should be released for other contributors. * Do not claim multiple (3) JIRAs at the same time. Try to finish them one after another. * Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code review greatly helps improve others' code as well as yours. h2. For committers: * Try to break down big features into small and specific JIRA tasks and link them properly. * Add starter label to starter tasks. * Put a rough estimate for medium/big features and track the progress. * If you start reviewing a PR, please add yourself to the Shepherd field on JIRA. * If the code looks good to you, please comment LGTM. For non-trivial PRs, please ping a maintainer to make a final pass. * After merging a PR, create and link JIRAs for Python, example code, and documentation if necessary. h1. Roadmap (WIP) This is NOT [a complete list of MLlib JIRAs for 1.5|https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20AND%20%22Target%20Version%2Fs%22%20%3D%201.5.0%20ORDER%20BY%20priority%20DESC]. We only include umbrella JIRAs and high-level tasks. h2. Algorithms and performance * LDA improvements (SPARK-5572) * Log-linear model for survival analysis (SPARK-8518) - 1.6 * Improve GLM's scalability on number of features (SPARK-8520) * Tree and ensembles: Move + cleanup code (SPARK-7131), provide class probabilities (SPARK-3727), feature importance (SPARK-5133) * Improve GMM scalability and stability (SPARK-5016) * Frequent pattern mining improvements (SPARK-6487) * R-like stats for ML models (SPARK-7674) * Generalize classification threshold to multiclass (SPARK-8069) * A/B testing (SPARK-3147) h2. Pipeline API * more feature transformers (SPARK-8521) * k-means (SPARK-7879) * naive Bayes (SPARK-8600) * TrainValidationSplit for tuning (SPARK-8484) * Isotonic regression (SPARK-8671) h2. Model persistence * more PMML export (SPARK-8545) * model save/load (SPARK-4587) * pipeline persistence (SPARK-6725) h2. Python API for ML * List of issues identified during Spark 1.4 QA: (SPARK-7536) * Python API for streaming ML algorithms (SPARK-3258) * Add missing model methods (SPARK-8633) h2. SparkR API for ML * MLlib + SparkR integration for 1.5 (RFormula + glm) (SPARK-6805) * model.matrix for DataFrames (SPARK-6823) h2. Documentation * [Search for documentation improvements | https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(Documentation)%20AND%20component%20in%20(ML%2C%20MLlib)] was: We expect to see many MLlib contributors for the 1.5 release. To scale out the development, we created this master list for MLlib features we plan to have in Spark 1.5. Please view this list as a wish list rather than a concrete plan, because we don't have an accurate estimate of available resources. Due to limited review bandwidth, features appearing on this list will get higher priority during code review. But feel free to
[jira] [Commented] (SPARK-9541) DateTimeUtils cleanup
[ https://issues.apache.org/jira/browse/SPARK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651082#comment-14651082 ] Apache Spark commented on SPARK-9541: - User 'yjshen' has created a pull request for this issue: https://github.com/apache/spark/pull/7870 DateTimeUtils cleanup - Key: SPARK-9541 URL: https://issues.apache.org/jira/browse/SPARK-9541 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Yijie Shen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9541) DateTimeUtils cleanup
[ https://issues.apache.org/jira/browse/SPARK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9541: --- Assignee: (was: Apache Spark) DateTimeUtils cleanup - Key: SPARK-9541 URL: https://issues.apache.org/jira/browse/SPARK-9541 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Yijie Shen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9541) DateTimeUtils cleanup
[ https://issues.apache.org/jira/browse/SPARK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9541: --- Assignee: Apache Spark DateTimeUtils cleanup - Key: SPARK-9541 URL: https://issues.apache.org/jira/browse/SPARK-9541 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Yijie Shen Assignee: Apache Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9140) Replace TimeTracker by Stopwatch
[ https://issues.apache.org/jira/browse/SPARK-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9140: --- Assignee: Apache Spark Replace TimeTracker by Stopwatch Key: SPARK-9140 URL: https://issues.apache.org/jira/browse/SPARK-9140 Project: Spark Issue Type: Sub-task Components: ML, MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Apache Spark Priority: Minor We can replace TImeTracker in tree implementations by Stopwatch. The initial PR could use local stopwatches only. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9140) Replace TimeTracker by Stopwatch
[ https://issues.apache.org/jira/browse/SPARK-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9140: --- Assignee: (was: Apache Spark) Replace TimeTracker by Stopwatch Key: SPARK-9140 URL: https://issues.apache.org/jira/browse/SPARK-9140 Project: Spark Issue Type: Sub-task Components: ML, MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Priority: Minor We can replace TImeTracker in tree implementations by Stopwatch. The initial PR could use local stopwatches only. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9140) Replace TimeTracker by Stopwatch
[ https://issues.apache.org/jira/browse/SPARK-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651083#comment-14651083 ] Apache Spark commented on SPARK-9140: - User 'hhbyyh' has created a pull request for this issue: https://github.com/apache/spark/pull/7871 Replace TimeTracker by Stopwatch Key: SPARK-9140 URL: https://issues.apache.org/jira/browse/SPARK-9140 Project: Spark Issue Type: Sub-task Components: ML, MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Priority: Minor We can replace TImeTracker in tree implementations by Stopwatch. The initial PR could use local stopwatches only. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9539) Repeated sc.close() in PySpark causes JVM memory leak
[ https://issues.apache.org/jira/browse/SPARK-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-9539. -- Resolution: Not A Problem Repeated sc.close() in PySpark causes JVM memory leak - Key: SPARK-9539 URL: https://issues.apache.org/jira/browse/SPARK-9539 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.1 Reporter: Andrey Zimovnov Priority: Minor Attachments: Screenshot at авг. 02 19-10-53.png Example code in Python: {code:python} for i in range(20): print i conf = SparkConf().setAppName(test) sc = SparkContext(conf=conf) hivec = HiveContext(sc) hivec.sql(select id from details_info limit 1).show() sc.stop() del hivec del sc {code} Jstat output: {noformat} S0CS1CS0US1U EC EUOC OU PC PUYGC YGCTFGCFGCT GCT 196608,0 196608,0 97566,2 0,0 1179648,0 542150,0 3145728,0120,0 154112,0 153613,2 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 679041,7 3145728,0120,0 164352,0 164183,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 907928,4 3145728,0120,0 164352,0 164200,3 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 912132,7 3145728,0120,0 164352,0 164200,5 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 913741,5 3145728,0120,0 164352,0 164200,8 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 929458,6 3145728,0120,0 164352,0 164206,0 40,434 0 0,0000,434 196608,0 196608,0 97566,2 0,0 1179648,0 1003138,1 3145728,0120,0 168960,0 168646,0 40,434 0 0,0000,434 131584,0 196608,0 0,0 109725,6 1179648,0 0,03145728,0128,0 175104,0 174802,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 152654,9 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 158586,1 3145728,0128,0 175104,0 174803,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 160659,8 3145728,0128,0 175104,0 174805,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 181935,2 3145728,0128,0 175104,0 174819,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 283389,1 3145728,0128,0 185856,0 185371,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 342596,4 3145728,0128,0 185856,0 185379,3 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 547634,7 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 555930,9 3145728,0128,0 185856,0 185385,8 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 557888,6 3145728,0128,0 185856,0 185386,0 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 573907,5 3145728,0128,0 185856,0 185397,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 637955,0 3145728,0128,0 189952,0 189533,1 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 895866,1 3145728,0128,0 196096,0 195968,5 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 948046,5 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 952427,2 3145728,0128,0 196096,0 195969,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 957977,5 3145728,0128,0 196096,0 195973,4 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 977811,1 3145728,0128,0 196096,0 195977,7 50,591 0 0,0000,591 131584,0 196608,0 0,0 109725,6 1179648,0 1118722,0 3145728,0128,0 206848,0 206539,0 50,591 0 0,0000,591 131584,0 144384,0 118692,5 0,0 1284096,0 183470,8 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 189718,5 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 192165,0 3145728,0136,0 206848,0 206543,4 60,773 0 0,0000,773 131584,0 144384,0 118692,5 0,0 1284096,0 199848,4 3145728,0
[jira] [Commented] (SPARK-9542) create unsafe version of map type
[ https://issues.apache.org/jira/browse/SPARK-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651100#comment-14651100 ] Apache Spark commented on SPARK-9542: - User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/7752 create unsafe version of map type - Key: SPARK-9542 URL: https://issues.apache.org/jira/browse/SPARK-9542 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9542) create unsafe version of map type
[ https://issues.apache.org/jira/browse/SPARK-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9542: --- Assignee: Apache Spark create unsafe version of map type - Key: SPARK-9542 URL: https://issues.apache.org/jira/browse/SPARK-9542 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Wenchen Fan Assignee: Apache Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9536) NaiveBayesModel support probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9536: - Assignee: Yanbo Liang NaiveBayesModel support probability prediction for PySpark.ml - Key: SPARK-9536 URL: https://issues.apache.org/jira/browse/SPARK-9536 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Assignee: Yanbo Liang Priority: Minor NaiveBayesModel support probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9538) LogisticRegression support raw and probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9538: - Assignee: Yanbo Liang LogisticRegression support raw and probability prediction for PySpark.ml Key: SPARK-9538 URL: https://issues.apache.org/jira/browse/SPARK-9538 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Assignee: Yanbo Liang Priority: Minor LogisticRegression support raw and probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8874) Add missing methods in Word2Vec ML
[ https://issues.apache.org/jira/browse/SPARK-8874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-8874: - Component/s: (was: PySpark) Add missing methods in Word2Vec ML -- Key: SPARK-8874 URL: https://issues.apache.org/jira/browse/SPARK-8874 Project: Spark Issue Type: New Feature Components: ML Reporter: Manoj Kumar Assignee: Manoj Kumar Add getVectors and findSynonyms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9537) DecisionTreeClassifierModel support probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9537: - Assignee: Yanbo Liang Target Version/s: 1.5.0 DecisionTreeClassifierModel support probability prediction for PySpark.ml - Key: SPARK-9537 URL: https://issues.apache.org/jira/browse/SPARK-9537 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Assignee: Yanbo Liang Priority: Minor DecisionTreeClassifierModel support probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9538) LogisticRegression support raw and probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9538: - Target Version/s: 1.5.0 LogisticRegression support raw and probability prediction for PySpark.ml Key: SPARK-9538 URL: https://issues.apache.org/jira/browse/SPARK-9538 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Assignee: Yanbo Liang Priority: Minor LogisticRegression support raw and probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9536) NaiveBayesModel support probability prediction for PySpark.ml
[ https://issues.apache.org/jira/browse/SPARK-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9536: - Target Version/s: 1.5.0 NaiveBayesModel support probability prediction for PySpark.ml - Key: SPARK-9536 URL: https://issues.apache.org/jira/browse/SPARK-9536 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Yanbo Liang Priority: Minor NaiveBayesModel support probability prediction for PySpark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it should be Java-friendly
[ https://issues.apache.org/jira/browse/SPARK-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-9527. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7869 [https://github.com/apache/spark/pull/7869] PrefixSpan.run should return a PrefixSpanModel instead of an RDD and it should be Java-friendly --- Key: SPARK-9527 URL: https://issues.apache.org/jira/browse/SPARK-9527 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Critical Fix For: 1.5.0 With a model wrapping the result RDD, it would be more flexible to add features in the future. And it should be Java-friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org