[jira] [Updated] (SPARK-9480) Create an map abstract class MapData and a default implementation backed by 2 ArrayData
[ https://issues.apache.org/jira/browse/SPARK-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-9480: --- Parent Issue: SPARK-9413 (was: SPARK-9389) > Create an map abstract class MapData and a default implementation backed by 2 > ArrayData > --- > > Key: SPARK-9480 > URL: https://issues.apache.org/jira/browse/SPARK-9480 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9480) Create an map abstract class MapData and a default implementation backed by 2 ArrayData
[ https://issues.apache.org/jira/browse/SPARK-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9480. Resolution: Fixed Assignee: Wenchen Fan Fix Version/s: 1.5.0 > Create an map abstract class MapData and a default implementation backed by 2 > ArrayData > --- > > Key: SPARK-9480 > URL: https://issues.apache.org/jira/browse/SPARK-9480 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8887) Explicitly define which data types can be used as dynamic partition columns
[ https://issues.apache.org/jira/browse/SPARK-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650196#comment-14650196 ] Reynold Xin commented on SPARK-8887: [~lian cheng] can we put this in 1.5? > Explicitly define which data types can be used as dynamic partition columns > --- > > Key: SPARK-8887 > URL: https://issues.apache.org/jira/browse/SPARK-8887 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.4.0 >Reporter: Cheng Lian > > {{InsertIntoHadoopFsRelation}} implements Hive compatible dynamic > partitioning insertion, which uses {{String.valueOf}} to write encode > partition column values into dynamic partition directories. This actually > limits the data types that can be used in partition column. For example, > string representation of {{StructType}} values is not well defined. However, > this limitation is not explicitly enforced. > There are several things we can improve: > # Enforce dynamic column data type requirements by adding analysis rules and > throws {{AnalysisException}} when violation occurs. > # Abstract away string representation of various data types, so that we don't > need to convert internal representation types (e.g. {{UTF8String}}) to > external types (e.g. {{String}}). A set of Hive compatible implementations > should be provided to ensure compatibility with Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9520) UnsafeFixedWidthAggregationMap should support in-place sorting of its own records
Reynold Xin created SPARK-9520: -- Summary: UnsafeFixedWidthAggregationMap should support in-place sorting of its own records Key: SPARK-9520 URL: https://issues.apache.org/jira/browse/SPARK-9520 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin In order to support sort-based external aggregation fallback, UnsafeFixedWidthAggregationMap needs to support sorting all of its records in-place. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9520) UnsafeFixedWidthAggregationMap should support in-place sorting of its own records
[ https://issues.apache.org/jira/browse/SPARK-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650201#comment-14650201 ] Apache Spark commented on SPARK-9520: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/7849 > UnsafeFixedWidthAggregationMap should support in-place sorting of its own > records > - > > Key: SPARK-9520 > URL: https://issues.apache.org/jira/browse/SPARK-9520 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > In order to support sort-based external aggregation fallback, > UnsafeFixedWidthAggregationMap needs to support sorting all of its records > in-place. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9520) UnsafeFixedWidthAggregationMap should support in-place sorting of its own records
[ https://issues.apache.org/jira/browse/SPARK-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9520: --- Assignee: Reynold Xin (was: Apache Spark) > UnsafeFixedWidthAggregationMap should support in-place sorting of its own > records > - > > Key: SPARK-9520 > URL: https://issues.apache.org/jira/browse/SPARK-9520 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > In order to support sort-based external aggregation fallback, > UnsafeFixedWidthAggregationMap needs to support sorting all of its records > in-place. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8269) string function: initcap
[ https://issues.apache.org/jira/browse/SPARK-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650200#comment-14650200 ] Apache Spark commented on SPARK-8269: - User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/7850 > string function: initcap > > > Key: SPARK-8269 > URL: https://issues.apache.org/jira/browse/SPARK-8269 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Cheng Hao > > initcap(string A): string > Returns string, with the first letter of each word in uppercase, all other > letters in lowercase. Words are delimited by whitespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9520) UnsafeFixedWidthAggregationMap should support in-place sorting of its own records
[ https://issues.apache.org/jira/browse/SPARK-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9520: --- Assignee: Apache Spark (was: Reynold Xin) > UnsafeFixedWidthAggregationMap should support in-place sorting of its own > records > - > > Key: SPARK-9520 > URL: https://issues.apache.org/jira/browse/SPARK-9520 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > > In order to support sort-based external aggregation fallback, > UnsafeFixedWidthAggregationMap needs to support sorting all of its records > in-place. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8232) complex function: sort_array
[ https://issues.apache.org/jira/browse/SPARK-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650206#comment-14650206 ] Apache Spark commented on SPARK-8232: - User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/7851 > complex function: sort_array > > > Key: SPARK-8232 > URL: https://issues.apache.org/jira/browse/SPARK-8232 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Cheng Hao > Fix For: 1.5.0 > > > sort_array(Array) > Sorts the input array in ascending order according to the natural ordering of > the array elements and returns it -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7446) Inverse transform for StringIndexer
[ https://issues.apache.org/jira/browse/SPARK-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-7446. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 6339 [https://github.com/apache/spark/pull/6339] > Inverse transform for StringIndexer > --- > > Key: SPARK-7446 > URL: https://issues.apache.org/jira/browse/SPARK-7446 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng >Assignee: holdenk >Priority: Minor > Fix For: 1.5.0 > > > It is useful to convert the encoded indices back to their string > representation for result inspection. We can add a parameter to > StringIndexer/StringIndexModel for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8873) Support cleaning up shuffle files when using shuffle service in Mesos
[ https://issues.apache.org/jira/browse/SPARK-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-8873: - Priority: Blocker (was: Critical) > Support cleaning up shuffle files when using shuffle service in Mesos > - > > Key: SPARK-8873 > URL: https://issues.apache.org/jira/browse/SPARK-8873 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 1.2.0 >Reporter: Timothy Chen >Assignee: Timothy Chen >Priority: Blocker > Labels: mesos > > With dynamic allocation enabled with Mesos, drivers can launch with shuffle > data cached in the external shuffle service. > However, there is no reliable way to let the shuffle service clean up the > shuffle data when the driver exits, since it may crash before it notifies the > shuffle service and shuffle data will be cached forever. > We need to implement a reliable way to detect driver termination and clean up > shuffle data accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9521) Require Maven 3.3.3+ in the build
Sean Owen created SPARK-9521: Summary: Require Maven 3.3.3+ in the build Key: SPARK-9521 URL: https://issues.apache.org/jira/browse/SPARK-9521 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 1.4.1 Reporter: Sean Owen Assignee: Sean Owen Priority: Trivial Patrick recently discovered a build problem that manifested because he was using the Maven 3.2.x installed on his system, and which was resolved by using Maven 3.3.x. Since we have a script that can install Maven 3.3.3 for anyone, it probably makes sense to just enforce use of Maven 3.3.3+ in the build. (Currently it's just 3.0.4+). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9521) Require Maven 3.3.3+ in the build
[ https://issues.apache.org/jira/browse/SPARK-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650210#comment-14650210 ] Apache Spark commented on SPARK-9521: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/7852 > Require Maven 3.3.3+ in the build > - > > Key: SPARK-9521 > URL: https://issues.apache.org/jira/browse/SPARK-9521 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.4.1 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Trivial > > Patrick recently discovered a build problem that manifested because he was > using the Maven 3.2.x installed on his system, and which was resolved by > using Maven 3.3.x. Since we have a script that can install Maven 3.3.3 for > anyone, it probably makes sense to just enforce use of Maven 3.3.3+ in the > build. (Currently it's just 3.0.4+). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9521) Require Maven 3.3.3+ in the build
[ https://issues.apache.org/jira/browse/SPARK-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9521: --- Assignee: Apache Spark (was: Sean Owen) > Require Maven 3.3.3+ in the build > - > > Key: SPARK-9521 > URL: https://issues.apache.org/jira/browse/SPARK-9521 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.4.1 >Reporter: Sean Owen >Assignee: Apache Spark >Priority: Trivial > > Patrick recently discovered a build problem that manifested because he was > using the Maven 3.2.x installed on his system, and which was resolved by > using Maven 3.3.x. Since we have a script that can install Maven 3.3.3 for > anyone, it probably makes sense to just enforce use of Maven 3.3.3+ in the > build. (Currently it's just 3.0.4+). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9522) SparkSubmit process can not exit if kill application when HiveThriftServer was starting
Weizhong created SPARK-9522: --- Summary: SparkSubmit process can not exit if kill application when HiveThriftServer was starting Key: SPARK-9522 URL: https://issues.apache.org/jira/browse/SPARK-9522 Project: Spark Issue Type: Improvement Reporter: Weizhong Priority: Minor When we start HiveThriftServer, we will start SparkContext first, then start HiveServer2, if we kill application while HiveServer2 is starting then SparkContext will stop successfully, but SparkSubmit process can not exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9521) Require Maven 3.3.3+ in the build
[ https://issues.apache.org/jira/browse/SPARK-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9521: --- Assignee: Sean Owen (was: Apache Spark) > Require Maven 3.3.3+ in the build > - > > Key: SPARK-9521 > URL: https://issues.apache.org/jira/browse/SPARK-9521 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.4.1 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Trivial > > Patrick recently discovered a build problem that manifested because he was > using the Maven 3.2.x installed on his system, and which was resolved by > using Maven 3.3.x. Since we have a script that can install Maven 3.3.3 for > anyone, it probably makes sense to just enforce use of Maven 3.3.3+ in the > build. (Currently it's just 3.0.4+). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9522) SparkSubmit process can not exit if kill application when HiveThriftServer was starting
[ https://issues.apache.org/jira/browse/SPARK-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weizhong updated SPARK-9522: Component/s: SQL > SparkSubmit process can not exit if kill application when HiveThriftServer > was starting > --- > > Key: SPARK-9522 > URL: https://issues.apache.org/jira/browse/SPARK-9522 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Weizhong >Priority: Minor > > When we start HiveThriftServer, we will start SparkContext first, then start > HiveServer2, if we kill application while HiveServer2 is starting then > SparkContext will stop successfully, but SparkSubmit process can not exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9522) SparkSubmit process can not exit if kill application when HiveThriftServer was starting
[ https://issues.apache.org/jira/browse/SPARK-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9522: --- Assignee: Apache Spark > SparkSubmit process can not exit if kill application when HiveThriftServer > was starting > --- > > Key: SPARK-9522 > URL: https://issues.apache.org/jira/browse/SPARK-9522 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Weizhong >Assignee: Apache Spark >Priority: Minor > > When we start HiveThriftServer, we will start SparkContext first, then start > HiveServer2, if we kill application while HiveServer2 is starting then > SparkContext will stop successfully, but SparkSubmit process can not exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9522) SparkSubmit process can not exit if kill application when HiveThriftServer was starting
[ https://issues.apache.org/jira/browse/SPARK-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650212#comment-14650212 ] Apache Spark commented on SPARK-9522: - User 'Sephiroth-Lin' has created a pull request for this issue: https://github.com/apache/spark/pull/7853 > SparkSubmit process can not exit if kill application when HiveThriftServer > was starting > --- > > Key: SPARK-9522 > URL: https://issues.apache.org/jira/browse/SPARK-9522 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Weizhong >Priority: Minor > > When we start HiveThriftServer, we will start SparkContext first, then start > HiveServer2, if we kill application while HiveServer2 is starting then > SparkContext will stop successfully, but SparkSubmit process can not exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9522) SparkSubmit process can not exit if kill application when HiveThriftServer was starting
[ https://issues.apache.org/jira/browse/SPARK-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9522: --- Assignee: (was: Apache Spark) > SparkSubmit process can not exit if kill application when HiveThriftServer > was starting > --- > > Key: SPARK-9522 > URL: https://issues.apache.org/jira/browse/SPARK-9522 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Weizhong >Priority: Minor > > When we start HiveThriftServer, we will start SparkContext first, then start > HiveServer2, if we kill application while HiveServer2 is starting then > SparkContext will stop successfully, but SparkSubmit process can not exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8999) Support non-temporal sequence in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-8999: - Assignee: Zhang JiaJin > Support non-temporal sequence in PrefixSpan > --- > > Key: SPARK-8999 > URL: https://issues.apache.org/jira/browse/SPARK-8999 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng >Assignee: Zhang JiaJin >Priority: Critical > Fix For: 1.5.0 > > > In SPARK-6487, we assume that all items are ordered. However, we should > support non-temporal sequences in PrefixSpan. This should be done before 1.5 > because it changes PrefixSpan APIs. > We can use `Array[Array[Int]]` or follow SPMF to use `Array[Int]` and use -1 > to mark itemset boundaries. The latter is more efficient for storage. If we > support generic item type, we can use null. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8999) Support non-temporal sequence in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-8999. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7818 [https://github.com/apache/spark/pull/7818] > Support non-temporal sequence in PrefixSpan > --- > > Key: SPARK-8999 > URL: https://issues.apache.org/jira/browse/SPARK-8999 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng >Priority: Critical > Fix For: 1.5.0 > > > In SPARK-6487, we assume that all items are ordered. However, we should > support non-temporal sequences in PrefixSpan. This should be done before 1.5 > because it changes PrefixSpan APIs. > We can use `Array[Array[Int]]` or follow SPMF to use `Array[Int]` and use -1 > to mark itemset boundaries. The latter is more efficient for storage. If we > support generic item type, we can use null. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8505) Add settings to kick `lint-r` from `./dev/run-test.py`
[ https://issues.apache.org/jira/browse/SPARK-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650233#comment-14650233 ] Yu Ishikawa commented on SPARK-8505: [~srowen] Yes, I acknowledge how it is assigned, but I thought it would be better to show my activity to the other developers. I would be careful next time. Thanks! > Add settings to kick `lint-r` from `./dev/run-test.py` > -- > > Key: SPARK-8505 > URL: https://issues.apache.org/jira/browse/SPARK-8505 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Yu Ishikawa > > Add some settings to kick `lint-r` script from `./dev/run-test.py` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8169) Add StopWordsRemover as a transformer
[ https://issues.apache.org/jira/browse/SPARK-8169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-8169. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 6742 [https://github.com/apache/spark/pull/6742] > Add StopWordsRemover as a transformer > - > > Key: SPARK-8169 > URL: https://issues.apache.org/jira/browse/SPARK-8169 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng >Assignee: yuhao yang > Fix For: 1.5.0 > > > StopWordsRemover takes a string array column and outputs a string array > column with all defined stop words removed. The transformer should also come > with a standard set of stop words as default. > {code} > val stopWords = new StopWordsRemover() > .setInputCol("words") > .setOutputCol("cleanWords") > .setStopWords(Array(...)) // optional > val output = stopWords.transform(df) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9523) Receiver for Spark Streaming does not naturally support kryo serializer
John Chen created SPARK-9523: Summary: Receiver for Spark Streaming does not naturally support kryo serializer Key: SPARK-9523 URL: https://issues.apache.org/jira/browse/SPARK-9523 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.3.0 Environment: Windows 7 local mode Reporter: John Chen Fix For: 1.3.2, 1.4.2 In some cases, some attributes in a class is not serializable, which you still want to use after serialization of the whole object, you'll have to customize your serialization codes. For example, you can declare those attributes as transient, which makes them ignored during serialization, and then you can reassign their values during deserialization. Now, if you're using Java serialization, you'll have to implement Serializable, and write those codes in readObject() and writeObejct() methods; And if you're using kryo serialization, you'll have to implement KryoSerializable, and write these codes in read() and write() methods. In Spark and Spark Streaming, you can set kryo as the serializer for speeding up. However, the functions taken by RDD or DStream operations are still serialized by Java serialization, which means you only need to write those custom serialization codes in readObject() and writeObejct() methods. But when it comes to Spark Streaming's Receiver, things are different. When you wish to customize an InputDStream, you must extend the Receiver. However, it turns out, the Receiver will be serialized by kryo if you set kryo serializer in SparkConf, and will fall back to Java serialization if you didn't. So here's comes the problems, if you want to change the serializer by configuration and make sure the Receiver runs perfectly for both Java and kryo, you'll have to write all the 4 methods above. First, it is redundant, since you'll have to write serialization/deserialization code almost twice; Secondly, there's nothing in the doc or in the code to inform users to implement the KryoSerializable interface. Since all other function parameters are serialized by Java only, I suggest you also make it so for the Receiver. It may be slower, but since the serialization will only be executed for each interval, it's durable. More importantly, it can cause fewer trouble -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9523) Receiver for Spark Streaming does not naturally support kryo serializer
[ https://issues.apache.org/jira/browse/SPARK-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Chen updated SPARK-9523: - Affects Version/s: (was: 1.3.0) 1.3.1 The issue occurs in 1.3.1, not tested in 1.4.0 or 1.4.1. However, the codes for Receiver in these versions seems identical. > Receiver for Spark Streaming does not naturally support kryo serializer > --- > > Key: SPARK-9523 > URL: https://issues.apache.org/jira/browse/SPARK-9523 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.3.1 > Environment: Windows 7 local mode >Reporter: John Chen > Labels: kryo, serialization > Fix For: 1.3.2, 1.4.2 > > Original Estimate: 120h > Remaining Estimate: 120h > > In some cases, some attributes in a class is not serializable, which you > still want to use after serialization of the whole object, you'll have to > customize your serialization codes. For example, you can declare those > attributes as transient, which makes them ignored during serialization, and > then you can reassign their values during deserialization. > Now, if you're using Java serialization, you'll have to implement > Serializable, and write those codes in readObject() and writeObejct() > methods; And if you're using kryo serialization, you'll have to implement > KryoSerializable, and write these codes in read() and write() methods. > In Spark and Spark Streaming, you can set kryo as the serializer for speeding > up. However, the functions taken by RDD or DStream operations are still > serialized by Java serialization, which means you only need to write those > custom serialization codes in readObject() and writeObejct() methods. > But when it comes to Spark Streaming's Receiver, things are different. When > you wish to customize an InputDStream, you must extend the Receiver. However, > it turns out, the Receiver will be serialized by kryo if you set kryo > serializer in SparkConf, and will fall back to Java serialization if you > didn't. > So here's comes the problems, if you want to change the serializer by > configuration and make sure the Receiver runs perfectly for both Java and > kryo, you'll have to write all the 4 methods above. First, it is redundant, > since you'll have to write serialization/deserialization code almost twice; > Secondly, there's nothing in the doc or in the code to inform users to > implement the KryoSerializable interface. > Since all other function parameters are serialized by Java only, I suggest > you also make it so for the Receiver. It may be slower, but since the > serialization will only be executed for each interval, it's durable. More > importantly, it can cause fewer trouble -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8505) Add settings to kick `lint-r` from `./dev/run-test.py`
[ https://issues.apache.org/jira/browse/SPARK-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650262#comment-14650262 ] Sean Owen commented on SPARK-8505: -- If you open a pull request, the JIRA is marked as "In Progress" and links to your PR. That pretty clearly shows your activity. > Add settings to kick `lint-r` from `./dev/run-test.py` > -- > > Key: SPARK-8505 > URL: https://issues.apache.org/jira/browse/SPARK-8505 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Yu Ishikawa > > Add some settings to kick `lint-r` script from `./dev/run-test.py` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6873) Some Hive-Catalyst comparison tests fail due to unimportant order of some printed elements
[ https://issues.apache.org/jira/browse/SPARK-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650268#comment-14650268 ] Cheng Lian commented on SPARK-6873: --- It's not important. Internally, Hive just traverses a hash map and dumps everything in it. So the order is decided by the implementation of the hash map. > Some Hive-Catalyst comparison tests fail due to unimportant order of some > printed elements > -- > > Key: SPARK-6873 > URL: https://issues.apache.org/jira/browse/SPARK-6873 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.3.1 >Reporter: Sean Owen >Assignee: Cheng Lian >Priority: Minor > > As I mentioned, I've been seeing 4 test failures in Hive tests for a while, > and actually it still affects master. I think it's a superficial problem that > only turns up when running on Java 8, but still, would probably be an easy > fix and good to fix. > Specifically, here are four tests and the bit that fails the comparison, > below. I tried to diagnose this but had trouble even finding where some of > this occurs, like the list of synonyms? > {code} > - show_tblproperties *** FAILED *** > Results do not match for show_tblproperties: > ... > !== HIVE - 2 row(s) == == CATALYST - 2 row(s) == > !tmptruebar bar value > !barbar value tmp true (HiveComparisonTest.scala:391) > {code} > {code} > - show_create_table_serde *** FAILED *** > Results do not match for show_create_table_serde: > ... >WITH SERDEPROPERTIES ( WITH > SERDEPROPERTIES ( > ! 'serialization.format'='$', > 'field.delim'=',', > ! 'field.delim'=',') > 'serialization.format'='$') > {code} > {code} > - udf_std *** FAILED *** > Results do not match for udf_std: > ... > !== HIVE - 2 row(s) == == CATALYST > - 2 row(s) == >std(x) - Returns the standard deviation of a set of numbers std(x) - > Returns the standard deviation of a set of numbers > !Synonyms: stddev_pop, stddev Synonyms: > stddev, stddev_pop (HiveComparisonTest.scala:391) > {code} > {code} > - udf_stddev *** FAILED *** > Results do not match for udf_stddev: > ... > !== HIVE - 2 row(s) ==== > CATALYST - 2 row(s) == >stddev(x) - Returns the standard deviation of a set of numbers stddev(x) > - Returns the standard deviation of a set of numbers > !Synonyms: stddev_pop, stdSynonyms: > std, stddev_pop (HiveComparisonTest.scala:391) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-1902) Spark shell prints error when :4040 port already in use
[ https://issues.apache.org/jira/browse/SPARK-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Morozov updated SPARK-1902: -- Comment: was deleted (was: It looks like package name has changed since and now log4j.properties has to have another logger name to turn it off: {noformat} log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR {noformat} I'm not sure what should I do: 1. Reopen this issue 2. Create a new one 3. Or it's not that important to make this change. Please, suggest.) > Spark shell prints error when :4040 port already in use > --- > > Key: SPARK-1902 > URL: https://issues.apache.org/jira/browse/SPARK-1902 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Ash >Assignee: Andrew Ash > Fix For: 1.1.0 > > > When running two shells on the same machine, I get the below error. The > issue is that the first shell takes port 4040, then the next tries tries 4040 > and fails so falls back to 4041, then a third would try 4040 and 4041 before > landing on 4042, etc. > We should catch the error and instead log as "Unable to use port 4041; > already in use. Attempting port 4042..." > {noformat} > 14/05/22 11:31:54 WARN component.AbstractLifeCycle: FAILED > SelectChannelConnector@0.0.0.0:4041: java.net.BindException: Address already > in use > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187) > at > org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316) > at > org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) > at org.eclipse.jetty.server.Server.doStart(Server.java:293) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) > at > org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:192) > at > org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:192) > at > org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:192) > at scala.util.Try$.apply(Try.scala:161) > at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:191) > at > org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:205) > at org.apache.spark.ui.WebUI.bind(WebUI.scala:99) > at org.apache.spark.SparkContext.(SparkContext.scala:217) > at > org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:957) > at $line3.$read$$iwC$$iwC.(:8) > at $line3.$read$$iwC.(:14) > at $line3.$read.(:16) > at $line3.$read$.(:20) > at $line3.$read$.() > at $line3.$eval$.(:7) > at $line3.$eval$.() > at $line3.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753) > at > org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:121) > at > org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:120) > at > org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:263) > at > org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:120) > at > org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:56) >
[jira] [Comment Edited] (SPARK-9000) Support generic item type in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650301#comment-14650301 ] Masaki Rikitoku edited comment on SPARK-9000 at 8/1/15 12:33 PM: - Hi Xiangrui Meng Thanks for your comments. I agree with you and feynmanliang because my modification for this ticket is very tiny. If I notice something about feynmanliang's pr, I will inform you. was (Author: rikima): Hi Xiangrui Meng Thanks for your comments. I agree with you and feynmanliang because my modification for this ticket is very tiny. If I notice something about feynmanliang's pr I will inform you. > Support generic item type in PrefixSpan > --- > > Key: SPARK-9000 > URL: https://issues.apache.org/jira/browse/SPARK-9000 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng >Priority: Critical > > In SPARK-6487, we only support Int type. It requires users to encode other > types into integer to use PrefixSpan. We should be able to do this inside > PrefixSpan, similar to FPGrowth. This should be done before 1.5 since it > changes APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9000) Support generic item type in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650301#comment-14650301 ] Masaki Rikitoku commented on SPARK-9000: Hi Xiangrui Meng Thanks for your comments. I agree with you and feynmanliang because my modification for this ticket is very tiny. If I notice something about feynmanliang's pr I will inform you. > Support generic item type in PrefixSpan > --- > > Key: SPARK-9000 > URL: https://issues.apache.org/jira/browse/SPARK-9000 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng >Priority: Critical > > In SPARK-6487, we only support Int type. It requires users to encode other > types into integer to use PrefixSpan. We should be able to do this inside > PrefixSpan, similar to FPGrowth. This should be done before 1.5 since it > changes APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9514) Add EventHubsReceiver to support Spark Streaming using Azure EventHubs
[ https://issues.apache.org/jira/browse/SPARK-9514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650308#comment-14650308 ] Nan Zhu commented on SPARK-9514: I think the best way to do it is to add a new component in external directory, if we ensure that the code is maintained in long term... > Add EventHubsReceiver to support Spark Streaming using Azure EventHubs > -- > > Key: SPARK-9514 > URL: https://issues.apache.org/jira/browse/SPARK-9514 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.4.1 >Reporter: shanyu zhao > Fix For: 1.5.0 > > > We need to add EventHubsReceiver implementation to support Spark Streaming > applications that receive data from Azure EventHubs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6227) PCA and SVD for PySpark
[ https://issues.apache.org/jira/browse/SPARK-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650314#comment-14650314 ] Manoj Kumar commented on SPARK-6227: [~mengxr] Can this be assigned to me? Since the blockmatrix PR is already worked on. > PCA and SVD for PySpark > --- > > Key: SPARK-6227 > URL: https://issues.apache.org/jira/browse/SPARK-6227 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 1.2.1 >Reporter: Julien Amelot > > The Dimensionality Reduction techniques are not available via Python (Scala + > Java only). > * Principal component analysis (PCA) > * Singular value decomposition (SVD) > Doc: > http://spark.apache.org/docs/1.2.1/mllib-dimensionality-reduction.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9524) Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark)
Samuel Marks created SPARK-9524: --- Summary: Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark) Key: SPARK-9524 URL: https://issues.apache.org/jira/browse/SPARK-9524 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.5.0 Environment: Ubuntu 15.04 Linux Kudu 3.19.0-25-generic #26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Reporter: Samuel Marks Priority: Blocker I start my ipython notebook like usual, after updating to the latest Spark (`git pull`). Also tried a complete folder removal + clone + `build/mvn -DskipTests clean package` just to be sure. I get a bunch of these 404 errors then this: {code:none} [W 00:13:49.462 NotebookApp] 404 GET /api/kernels/e7db54cb-f7bb-4bdf-8d0c-76110f26c12c/channels?session_id=64C70A32AA2940808FDCA038A3D9E5B5 (127.0.0.1) 3.72ms referer=None 2.4+ kernel w/o ELF notes? -- report this {code} PS: None of my Python code works within `ipython notebook` when it's launched via pyspark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9525) Optimize SparseVector initializations in linalg
Manoj Kumar created SPARK-9525: -- Summary: Optimize SparseVector initializations in linalg Key: SPARK-9525 URL: https://issues.apache.org/jira/browse/SPARK-9525 Project: Spark Issue Type: Improvement Components: MLlib, PySpark Reporter: Manoj Kumar Priority: Minor 1. Remove sorting of indices and assume that the user gives a sorted tuple of indices, values etc 2. Avoid iterating twice to get the indices and values if the argument provided is a dict. 3. Add checks such that the length of the indices should be less than the size provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9524) Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark)
[ https://issues.apache.org/jira/browse/SPARK-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-9524: - Affects Version/s: (was: 1.5.0) Priority: Major (was: Blocker) [~SamuelMarks] please read https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark first. Don't set Blocker, for example; 1.5.0 can't be the affected version. > Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark) > > > Key: SPARK-9524 > URL: https://issues.apache.org/jira/browse/SPARK-9524 > Project: Spark > Issue Type: Bug > Components: PySpark > Environment: Ubuntu 15.04 > Linux Kudu 3.19.0-25-generic #26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015 > x86_64 x86_64 x86_64 GNU/Linux >Reporter: Samuel Marks > > I start my ipython notebook like usual, after updating to the latest Spark > (`git pull`). Also tried a complete folder removal + clone + `build/mvn > -DskipTests clean package` just to be sure. > I get a bunch of these 404 errors then this: > {code:none} > [W 00:13:49.462 NotebookApp] 404 GET > /api/kernels/e7db54cb-f7bb-4bdf-8d0c-76110f26c12c/channels?session_id=64C70A32AA2940808FDCA038A3D9E5B5 > (127.0.0.1) 3.72ms referer=None > 2.4+ kernel w/o ELF notes? -- report this > {code} > PS: None of my Python code works within `ipython notebook` when it's launched > via pyspark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9524) Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark)
[ https://issues.apache.org/jira/browse/SPARK-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-9524. -- Resolution: Invalid This doesn't appear to be related to Spark. At least none of the error here shows anything from Pyspark. > Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark) > > > Key: SPARK-9524 > URL: https://issues.apache.org/jira/browse/SPARK-9524 > Project: Spark > Issue Type: Bug > Components: PySpark > Environment: Ubuntu 15.04 > Linux Kudu 3.19.0-25-generic #26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015 > x86_64 x86_64 x86_64 GNU/Linux >Reporter: Samuel Marks > > I start my ipython notebook like usual, after updating to the latest Spark > (`git pull`). Also tried a complete folder removal + clone + `build/mvn > -DskipTests clean package` just to be sure. > I get a bunch of these 404 errors then this: > {code:none} > [W 00:13:49.462 NotebookApp] 404 GET > /api/kernels/e7db54cb-f7bb-4bdf-8d0c-76110f26c12c/channels?session_id=64C70A32AA2940808FDCA038A3D9E5B5 > (127.0.0.1) 3.72ms referer=None > 2.4+ kernel w/o ELF notes? -- report this > {code} > PS: None of my Python code works within `ipython notebook` when it's launched > via pyspark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9525) Optimize SparseVector initializations in linalg
[ https://issues.apache.org/jira/browse/SPARK-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9525: --- Assignee: Apache Spark > Optimize SparseVector initializations in linalg > --- > > Key: SPARK-9525 > URL: https://issues.apache.org/jira/browse/SPARK-9525 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark >Reporter: Manoj Kumar >Assignee: Apache Spark >Priority: Minor > > 1. Remove sorting of indices and assume that the user gives a sorted tuple of > indices, values etc > 2. Avoid iterating twice to get the indices and values if the argument > provided is a dict. > 3. Add checks such that the length of the indices should be less than the > size provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9525) Optimize SparseVector initializations in linalg
[ https://issues.apache.org/jira/browse/SPARK-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9525: --- Assignee: (was: Apache Spark) > Optimize SparseVector initializations in linalg > --- > > Key: SPARK-9525 > URL: https://issues.apache.org/jira/browse/SPARK-9525 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark >Reporter: Manoj Kumar >Priority: Minor > > 1. Remove sorting of indices and assume that the user gives a sorted tuple of > indices, values etc > 2. Avoid iterating twice to get the indices and values if the argument > provided is a dict. > 3. Add checks such that the length of the indices should be less than the > size provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9525) Optimize SparseVector initializations in linalg
[ https://issues.apache.org/jira/browse/SPARK-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650392#comment-14650392 ] Apache Spark commented on SPARK-9525: - User 'MechCoder' has created a pull request for this issue: https://github.com/apache/spark/pull/7854 > Optimize SparseVector initializations in linalg > --- > > Key: SPARK-9525 > URL: https://issues.apache.org/jira/browse/SPARK-9525 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark >Reporter: Manoj Kumar >Priority: Minor > > 1. Remove sorting of indices and assume that the user gives a sorted tuple of > indices, values etc > 2. Avoid iterating twice to get the indices and values if the argument > provided is a dict. > 3. Add checks such that the length of the indices should be less than the > size provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9525) Optimize SparseVector initializations in linalg
[ https://issues.apache.org/jira/browse/SPARK-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Kumar updated SPARK-9525: --- Priority: Major (was: Minor) > Optimize SparseVector initializations in linalg > --- > > Key: SPARK-9525 > URL: https://issues.apache.org/jira/browse/SPARK-9525 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark >Reporter: Manoj Kumar > > 1. Remove sorting of indices and assume that the user gives a sorted tuple of > indices, values etc > 2. Avoid iterating twice to get the indices and values if the argument > provided is a dict. > 3. Add checks such that the length of the indices should be less than the > size provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8263) string function: substr/substring should also support binary type
[ https://issues.apache.org/jira/browse/SPARK-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-8263. --- Resolution: Fixed Fix Version/s: 1.5.0 Target Version/s: (was: ) Issue resolved by pull request 7848 [https://github.com/apache/spark/pull/7848] > string function: substr/substring should also support binary type > - > > Key: SPARK-8263 > URL: https://issues.apache.org/jira/browse/SPARK-8263 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Cheng Hao >Priority: Minor > Fix For: 1.5.0 > > > See Hive's: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9524) Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark)
[ https://issues.apache.org/jira/browse/SPARK-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650413#comment-14650413 ] Samuel Marks commented on SPARK-9524: - You're welcome to close the issue, it was only reported because it said: "2.4+ kernel w/o ELF notes? -- report this". Working from the 1.4 branch and everything built fine + works fine, which means that it's unlikely to be an IPython issue. Anyways. > Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark) > > > Key: SPARK-9524 > URL: https://issues.apache.org/jira/browse/SPARK-9524 > Project: Spark > Issue Type: Bug > Components: PySpark > Environment: Ubuntu 15.04 > Linux Kudu 3.19.0-25-generic #26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015 > x86_64 x86_64 x86_64 GNU/Linux >Reporter: Samuel Marks > > I start my ipython notebook like usual, after updating to the latest Spark > (`git pull`). Also tried a complete folder removal + clone + `build/mvn > -DskipTests clean package` just to be sure. > I get a bunch of these 404 errors then this: > {code:none} > [W 00:13:49.462 NotebookApp] 404 GET > /api/kernels/e7db54cb-f7bb-4bdf-8d0c-76110f26c12c/channels?session_id=64C70A32AA2940808FDCA038A3D9E5B5 > (127.0.0.1) 3.72ms referer=None > 2.4+ kernel w/o ELF notes? -- report this > {code} > PS: None of my Python code works within `ipython notebook` when it's launched > via pyspark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9526) Utilize ScalaCheck to reveal potential bugs in sql expressions
Yijie Shen created SPARK-9526: - Summary: Utilize ScalaCheck to reveal potential bugs in sql expressions Key: SPARK-9526 URL: https://issues.apache.org/jira/browse/SPARK-9526 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Yijie Shen Priority: Blocker -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8999) Support non-temporal sequence in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650415#comment-14650415 ] Xiangrui Meng commented on SPARK-8999: -- [~srowen] Thanks for your feedback! PrefixSpan paper has ~2k citations and I can find implementations in many libraries, e.g., SPMF, R. I think it is fair to say the algorithm is popular in data mining. The question I had is whether we want to support sequences of itemsets instead of sequences of items. The former complicates both the API and the implementation. I asked the author of SPMF for advice. He said without itemset support it is called string mining, which should be efficiently handled by some other algorithms. So it seems that we should implement PrefixSpan as in the paper, which supports itemsets. > Support non-temporal sequence in PrefixSpan > --- > > Key: SPARK-8999 > URL: https://issues.apache.org/jira/browse/SPARK-8999 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng >Assignee: Zhang JiaJin >Priority: Critical > Fix For: 1.5.0 > > > In SPARK-6487, we assume that all items are ordered. However, we should > support non-temporal sequences in PrefixSpan. This should be done before 1.5 > because it changes PrefixSpan APIs. > We can use `Array[Array[Int]]` or follow SPMF to use `Array[Int]` and use -1 > to mark itemset boundaries. The latter is more efficient for storage. If we > support generic item type, we can use null. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9526) Utilize randomized testing to reveal potential bugs in sql expressions
[ https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yijie Shen updated SPARK-9526: -- Summary: Utilize randomized testing to reveal potential bugs in sql expressions (was: Utilize ScalaCheck to reveal potential bugs in sql expressions) > Utilize randomized testing to reveal potential bugs in sql expressions > -- > > Key: SPARK-9526 > URL: https://issues.apache.org/jira/browse/SPARK-9526 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yijie Shen >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions
[ https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yijie Shen updated SPARK-9526: -- Summary: Utilize randomized tests to reveal potential bugs in sql expressions (was: Utilize randomized testing to reveal potential bugs in sql expressions) > Utilize randomized tests to reveal potential bugs in sql expressions > > > Key: SPARK-9526 > URL: https://issues.apache.org/jira/browse/SPARK-9526 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yijie Shen >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9527) PrefixSpan.run should return a PrefixSpanModel instead of an RDD
Xiangrui Meng created SPARK-9527: Summary: PrefixSpan.run should return a PrefixSpanModel instead of an RDD Key: SPARK-9527 URL: https://issues.apache.org/jira/browse/SPARK-9527 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Priority: Critical With a model wrapping the result RDD, it would be more flexible to add features in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions
[ https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9526: --- Assignee: Apache Spark > Utilize randomized tests to reveal potential bugs in sql expressions > > > Key: SPARK-9526 > URL: https://issues.apache.org/jira/browse/SPARK-9526 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yijie Shen >Assignee: Apache Spark >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions
[ https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650420#comment-14650420 ] Apache Spark commented on SPARK-9526: - User 'yjshen' has created a pull request for this issue: https://github.com/apache/spark/pull/7855 > Utilize randomized tests to reveal potential bugs in sql expressions > > > Key: SPARK-9526 > URL: https://issues.apache.org/jira/browse/SPARK-9526 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yijie Shen >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions
[ https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9526: --- Assignee: (was: Apache Spark) > Utilize randomized tests to reveal potential bugs in sql expressions > > > Key: SPARK-9526 > URL: https://issues.apache.org/jira/browse/SPARK-9526 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yijie Shen >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions
[ https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650421#comment-14650421 ] Sean Owen commented on SPARK-9526: -- [~yijieshen] at this point should we really be making new blockers for the 1.5.0 release? the merge window has closed, technically, and this looks like just a small nice-to-have > Utilize randomized tests to reveal potential bugs in sql expressions > > > Key: SPARK-9526 > URL: https://issues.apache.org/jira/browse/SPARK-9526 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yijie Shen >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions
[ https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650423#comment-14650423 ] Yijie Shen commented on SPARK-9526: --- [~srowen] thanks for reminding, the current randomised tests reveal some bugs in Spark SQL expression evaluation, so I think it might be a blocker. What do you think? > Utilize randomized tests to reveal potential bugs in sql expressions > > > Key: SPARK-9526 > URL: https://issues.apache.org/jira/browse/SPARK-9526 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yijie Shen >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5754) Spark AM not launching on Windows
[ https://issues.apache.org/jira/browse/SPARK-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650444#comment-14650444 ] Carsten Blank commented on SPARK-5754: -- Okay so I have thought about this more and kinda have a "qualified" opinion now. I assume that you have fixed this issue for your problem? Did you write a separate escapeForShell for Windows? I have and I would like to suggest something like that for a PR. How did you solve this? > Spark AM not launching on Windows > - > > Key: SPARK-5754 > URL: https://issues.apache.org/jira/browse/SPARK-5754 > Project: Spark > Issue Type: Bug > Components: Windows, YARN >Affects Versions: 1.1.1, 1.2.0 > Environment: Windows Server 2012, Hadoop 2.4.1. >Reporter: Inigo > > I'm trying to run Spark Pi on a YARN cluster running on Windows and the AM > container fails to start. The problem seems to be in the generation of the > YARN command which adds single quotes (') surrounding some of the java > options. In particular, the part of the code that is adding those is the > escapeForShell function in YarnSparkHadoopUtil. Apparently, Windows does not > like the quotes for these options. Here is an example of the command that the > container tries to execute: > @call %JAVA_HOME%/bin/java -server -Xmx512m -Djava.io.tmpdir=%PWD%/tmp > '-Dspark.yarn.secondary.jars=' > '-Dspark.app.name=org.apache.spark.examples.SparkPi' > '-Dspark.master=yarn-cluster' org.apache.spark.deploy.yarn.ApplicationMaster > --class 'org.apache.spark.examples.SparkPi' --jar > 'file:/D:/data/spark-1.1.1-bin-hadoop2.4/bin/../lib/spark-examples-1.1.1-hadoop2.4.0.jar' > --executor-memory 1024 --executor-cores 1 --num-executors 2 > Once I transform it into: > @call %JAVA_HOME%/bin/java -server -Xmx512m -Djava.io.tmpdir=%PWD%/tmp > -Dspark.yarn.secondary.jars= > -Dspark.app.name=org.apache.spark.examples.SparkPi > -Dspark.master=yarn-cluster org.apache.spark.deploy.yarn.ApplicationMaster > --class 'org.apache.spark.examples.SparkPi' --jar > 'file:/D:/data/spark-1.1.1-bin-hadoop2.4/bin/../lib/spark-examples-1.1.1-hadoop2.4.0.jar' > --executor-memory 1024 --executor-cores 1 --num-executors 2 > Everything seems to start. > How should I deal with this? Creating a separate function like escapeForShell > for Windows and call it whenever I detect this is for Windows? Or should I > add some sanity check on YARN? > I checked a little and there seems to be people that is able to run Spark on > YARN on Windows, so it might be something else. I didn't find anything > related on Jira either. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9524) Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark)
[ https://issues.apache.org/jira/browse/SPARK-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650448#comment-14650448 ] Sean Owen commented on SPARK-9524: -- That's an error from ipython though, not Spark. It doesn't follow that it's a Spark issue just because ipython + Spark x doesn't exhibit whatever problem you're seeing. Who knows, but, since it's an ipython error I'd start there. > Latest Spark (8765665015ef47a23e00f7d01d4d280c31bb236d) breaks (pyspark) > > > Key: SPARK-9524 > URL: https://issues.apache.org/jira/browse/SPARK-9524 > Project: Spark > Issue Type: Bug > Components: PySpark > Environment: Ubuntu 15.04 > Linux Kudu 3.19.0-25-generic #26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015 > x86_64 x86_64 x86_64 GNU/Linux >Reporter: Samuel Marks > > I start my ipython notebook like usual, after updating to the latest Spark > (`git pull`). Also tried a complete folder removal + clone + `build/mvn > -DskipTests clean package` just to be sure. > I get a bunch of these 404 errors then this: > {code:none} > [W 00:13:49.462 NotebookApp] 404 GET > /api/kernels/e7db54cb-f7bb-4bdf-8d0c-76110f26c12c/channels?session_id=64C70A32AA2940808FDCA038A3D9E5B5 > (127.0.0.1) 3.72ms referer=None > 2.4+ kernel w/o ELF notes? -- report this > {code} > PS: None of my Python code works within `ipython notebook` when it's launched > via pyspark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8069) Add support for cutoff to RandomForestClassifier
[ https://issues.apache.org/jira/browse/SPARK-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650477#comment-14650477 ] holdenk commented on SPARK-8069: So I was looking at https://docs.google.com/document/d/1nV6m7sqViHkEpawelq1S5_QLWWAouSlv81eiEEjKuJY/edit?pli=1# and I can't comment or edit the design document so I figured I'd write my notes here ([~josephkb] if you could give me comment permission on the document that would be great). The document calls for only having thresholds for ProbabilisticClassifier but also up above discusses having an implementation for both, which one do we want to do? > Add support for cutoff to RandomForestClassifier > > > Key: SPARK-8069 > URL: https://issues.apache.org/jira/browse/SPARK-8069 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: holdenk >Assignee: holdenk >Priority: Minor > Original Estimate: 240h > Remaining Estimate: 240h > > Consider adding support for cutoffs similar to > http://cran.r-project.org/web/packages/randomForest/randomForest.pdf > (Joseph) I just wrote a [little design doc | > https://docs.google.com/document/d/1nV6m7sqViHkEpawelq1S5_QLWWAouSlv81eiEEjKuJY/edit?usp=sharing] > for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6873) Some Hive-Catalyst comparison tests fail due to unimportant order of some printed elements
[ https://issues.apache.org/jira/browse/SPARK-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650480#comment-14650480 ] Sean Owen commented on SPARK-6873: -- [~rxin] [~lian cheng] It's still a problem. Yes I'm sure it's just a test issue, not a problem with the code, but ideally the test must not rely on the ordering. Right now tests don't actually pass in Java 8 because of things like ... {code} - show_create_table_serde *** FAILED *** Results do not match for show_create_table_serde: == Parsed Logical Plan == HiveNativeCommand SHOW CREATE TABLE tmp_showcrt1 == Analyzed Logical Plan == result: string HiveNativeCommand SHOW CREATE TABLE tmp_showcrt1 == Optimized Logical Plan == HiveNativeCommand SHOW CREATE TABLE tmp_showcrt1 == Physical Plan == ExecutedCommand (HiveNativeCommand SHOW CREATE TABLE tmp_showcrt1) Code Generation: true == RDD == result !== HIVE - 13 row(s) == == CATALYST - 13 row(s) == CREATE EXTERNAL TABLE `tmp_showcrt1`( CREATE EXTERNAL TABLE `tmp_showcrt1`( `key` string, `key` string, `value` boolean)`value` boolean) ROW FORMAT SERDEROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED BY STORED BY 'org.apache.hadoop.hive.ql.metadata.DefaultStorageHandler' 'org.apache.hadoop.hive.ql.metadata.DefaultStorageHandler' WITH SERDEPROPERTIES ( WITH SERDEPROPERTIES ( ! 'serialization.format'='$', 'field.delim'=',', ! 'field.delim'=',') 'serialization.format'='$') LOCATIONLOCATION 'tmp_showcrt1' 'tmp_showcrt1' TBLPROPERTIES ( TBLPROPERTIES ( (HiveComparisonTest.scala:397) {code} I build with {{-Pyarn -Phive}} from master. > Some Hive-Catalyst comparison tests fail due to unimportant order of some > printed elements > -- > > Key: SPARK-6873 > URL: https://issues.apache.org/jira/browse/SPARK-6873 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.3.1 >Reporter: Sean Owen >Assignee: Cheng Lian >Priority: Minor > > As I mentioned, I've been seeing 4 test failures in Hive tests for a while, > and actually it still affects master. I think it's a superficial problem that > only turns up when running on Java 8, but still, would probably be an easy > fix and good to fix. > Specifically, here are four tests and the bit that fails the comparison, > below. I tried to diagnose this but had trouble even finding where some of > this occurs, like the list of synonyms? > {code} > - show_tblproperties *** FAILED *** > Results do not match for show_tblproperties: > ... > !== HIVE - 2 row(s) == == CATALYST - 2 row(s) == > !tmptruebar bar value > !barbar value tmp true (HiveComparisonTest.scala:391) > {code} > {code} > - show_create_table_serde *** FAILED *** > Results do not match for show_create_table_serde: > ... >WITH SERDEPROPERTIES ( WITH > SERDEPROPERTIES ( > ! 'serialization.format'='$', > 'field.delim'=',', > ! 'field.delim'=',') > 'serialization.format'='$') > {code} > {code} > - udf_std *** FAILED *** > Results do not match for udf_std: > ... > !== HIVE - 2 row(s) == == CATALYST > - 2 row(s) == >std(x) - Returns the standard deviation of a set of numbers std(x) - > Returns the standard deviation of a set of numbers > !Synonyms: stddev_pop, stddev Synonyms: > stddev, stddev_pop (HiveComparisonTest.scala:391) > {code} > {code} > - udf_stddev *** FAILED *** > Results do not match for udf_stddev: > ... > !== HIVE - 2 row(s) ==== > CATALYST - 2 row(s) == >stddev(x) - Returns the standard deviation of a set of numbers stddev(x) > - Returns the standard deviation of a set of numbers > !Synonyms: stddev_pop, stdSynonyms: > std, stddev_pop (HiveComparisonTest.scala:391) > {code} -- This message was sent by
[jira] [Commented] (SPARK-3166) Custom serialisers can't be shipped in application jars
[ https://issues.apache.org/jira/browse/SPARK-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650486#comment-14650486 ] Josh Rosen commented on SPARK-3166: --- Does anyone know if this is still an issue in newer Spark versions? > Custom serialisers can't be shipped in application jars > --- > > Key: SPARK-3166 > URL: https://issues.apache.org/jira/browse/SPARK-3166 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 >Reporter: Graham Dennis > > Spark cannot currently use a custom serialiser that is shipped with the > application jar. Trying to do this causes a java.lang.ClassNotFoundException > when trying to instantiate the custom serialiser in the Executor processes. > This occurs because Spark attempts to instantiate the custom serialiser > before the application jar has been shipped to the Executor process. A > reproduction of the problem is available here: > https://github.com/GrahamDennis/spark-custom-serialiser > I've verified this problem in Spark 1.0.2, and Spark master and 1.1 branches > as of August 21, 2014. This issue is related to SPARK-2878, and my fix for > that issue (https://github.com/apache/spark/pull/1890) also solves this. My > pull request was not merged because it adds the user jar to the Executor > processes' class path at launch time. Such a significant change was thought > by [~rxin] to require more QA, and should be considered for inclusion in 1.2 > at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions
[ https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-9526: --- Priority: Minor (was: Blocker) > Utilize randomized tests to reveal potential bugs in sql expressions > > > Key: SPARK-9526 > URL: https://issues.apache.org/jira/browse/SPARK-9526 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yijie Shen >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions
[ https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-9526: --- Priority: Major (was: Minor) > Utilize randomized tests to reveal potential bugs in sql expressions > > > Key: SPARK-9526 > URL: https://issues.apache.org/jira/browse/SPARK-9526 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yijie Shen > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9526) Utilize randomized tests to reveal potential bugs in sql expressions
[ https://issues.apache.org/jira/browse/SPARK-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650498#comment-14650498 ] Reynold Xin commented on SPARK-9526: I downgraded it to major. While it is great to have (especially if it finds a lot of bugs that can help QA), I don't think this is a release blocker. > Utilize randomized tests to reveal potential bugs in sql expressions > > > Key: SPARK-9526 > URL: https://issues.apache.org/jira/browse/SPARK-9526 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yijie Shen > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-4751) Support dynamic allocation for standalone mode
[ https://issues.apache.org/jira/browse/SPARK-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-4751. Resolution: Fixed Fix Version/s: 1.5.0 > Support dynamic allocation for standalone mode > -- > > Key: SPARK-4751 > URL: https://issues.apache.org/jira/browse/SPARK-4751 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Critical > Fix For: 1.5.0 > > > This is equivalent to SPARK-3822 but for standalone mode. > This is actually a very tricky issue because the scheduling mechanism in the > standalone Master uses different semantics. In standalone mode we allocate > resources based on cores. By default, an application will grab all the cores > in the cluster unless "spark.cores.max" is specified. Unfortunately, this > means an application could get executors of different sizes (in terms of > cores) if: > 1) App 1 kills an executor > 2) App 2, with "spark.cores.max" set, grabs a subset of cores on a worker > 3) App 1 requests an executor > In this case, the new executor that App 1 gets back will be smaller than the > rest and can execute fewer tasks in parallel. Further, standalone mode is > subject to the constraint that only one executor can be allocated on each > worker per application. As a result, it is rather meaningless to request new > executors if the existing ones are already spread out across all nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8069) Add support for cutoff to RandomForestClassifier
[ https://issues.apache.org/jira/browse/SPARK-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650514#comment-14650514 ] Joseph K. Bradley commented on SPARK-8069: -- My final plan was to only have it for ProbabilisticClassifier. That note about Classifier is out of date; I forgot to update it, but will now. > Add support for cutoff to RandomForestClassifier > > > Key: SPARK-8069 > URL: https://issues.apache.org/jira/browse/SPARK-8069 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: holdenk >Assignee: holdenk >Priority: Minor > Original Estimate: 240h > Remaining Estimate: 240h > > Consider adding support for cutoffs similar to > http://cran.r-project.org/web/packages/randomForest/randomForest.pdf > (Joseph) I just wrote a [little design doc | > https://docs.google.com/document/d/1nV6m7sqViHkEpawelq1S5_QLWWAouSlv81eiEEjKuJY/edit?usp=sharing] > for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9528) RandomForestClassifier should extend ProbabilisticClassifier
Joseph K. Bradley created SPARK-9528: Summary: RandomForestClassifier should extend ProbabilisticClassifier Key: SPARK-9528 URL: https://issues.apache.org/jira/browse/SPARK-9528 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Joseph K. Bradley Assignee: Joseph K. Bradley Now that DecisionTreeClassifier extends ProbabilisticClassifier, we can have RandomForestClassifier extends ProbabilisticClassifier as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9491) App running on secure YARN with no HBase config will hang
[ https://issues.apache.org/jira/browse/SPARK-9491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-9491: -- Assignee: Marcelo Vanzin Affects Version/s: 1.4.0 Target Version/s: 1.4.2, 1.5.0 (was: 1.5.0) Fix Version/s: 1.5.0 > App running on secure YARN with no HBase config will hang > - > > Key: SPARK-9491 > URL: https://issues.apache.org/jira/browse/SPARK-9491 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.4.0, 1.5.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Blocker > Fix For: 1.5.0 > > > Because HBase may not be available, or the default config may be pointing at > the wrong information for HBase, the YARN backend may end up waiting forever > at this point: > {noformat} > "main" prio=10 tid=0x7f96c8016000 nid=0x1aa6 waiting on condition > [0x7f96cda96000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:443) > at > org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:60) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1123) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1110) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1067) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:902) > at > org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:78) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124) > at > org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95) > at > org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:73) > at > org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService$BlockingStub.getAuthenticationToken(AuthenticationProtos.java:4512) > at > org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:86) > at > org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:69) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.yarn.Client$.obtainTokenForHBase(Client.scala:1299) > at > org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:270) > {noformat} > The code shouldn't try to fetch HBase delegation tokens when HBase is not > configured. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9491) App running on secure YARN with no HBase config will hang
[ https://issues.apache.org/jira/browse/SPARK-9491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-9491: -- Fix Version/s: 1.4.2 > App running on secure YARN with no HBase config will hang > - > > Key: SPARK-9491 > URL: https://issues.apache.org/jira/browse/SPARK-9491 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.4.0, 1.5.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Blocker > Fix For: 1.4.2, 1.5.0 > > > Because HBase may not be available, or the default config may be pointing at > the wrong information for HBase, the YARN backend may end up waiting forever > at this point: > {noformat} > "main" prio=10 tid=0x7f96c8016000 nid=0x1aa6 waiting on condition > [0x7f96cda96000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:443) > at > org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:60) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1123) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1110) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1067) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:902) > at > org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:78) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124) > at > org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95) > at > org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:73) > at > org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService$BlockingStub.getAuthenticationToken(AuthenticationProtos.java:4512) > at > org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:86) > at > org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:69) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.yarn.Client$.obtainTokenForHBase(Client.scala:1299) > at > org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:270) > {noformat} > The code shouldn't try to fetch HBase delegation tokens when HBase is not > configured. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9491) App running on secure YARN with no HBase config will hang
[ https://issues.apache.org/jira/browse/SPARK-9491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-9491. --- Resolution: Fixed > App running on secure YARN with no HBase config will hang > - > > Key: SPARK-9491 > URL: https://issues.apache.org/jira/browse/SPARK-9491 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.4.0, 1.5.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Blocker > Fix For: 1.4.2, 1.5.0 > > > Because HBase may not be available, or the default config may be pointing at > the wrong information for HBase, the YARN backend may end up waiting forever > at this point: > {noformat} > "main" prio=10 tid=0x7f96c8016000 nid=0x1aa6 waiting on condition > [0x7f96cda96000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:443) > at > org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:60) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1123) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1110) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1067) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:902) > at > org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:78) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124) > at > org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95) > at > org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:73) > at > org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService$BlockingStub.getAuthenticationToken(AuthenticationProtos.java:4512) > at > org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:86) > at > org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:69) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.yarn.Client$.obtainTokenForHBase(Client.scala:1299) > at > org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:270) > {noformat} > The code shouldn't try to fetch HBase delegation tokens when HBase is not > configured. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9520) UnsafeFixedWidthAggregationMap should support in-place sorting of its own records
[ https://issues.apache.org/jira/browse/SPARK-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-9520. --- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7849 [https://github.com/apache/spark/pull/7849] > UnsafeFixedWidthAggregationMap should support in-place sorting of its own > records > - > > Key: SPARK-9520 > URL: https://issues.apache.org/jira/browse/SPARK-9520 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 1.5.0 > > > In order to support sort-based external aggregation fallback, > UnsafeFixedWidthAggregationMap needs to support sorting all of its records > in-place. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9529) Improve sort on Decimal
Davies Liu created SPARK-9529: - Summary: Improve sort on Decimal Key: SPARK-9529 URL: https://issues.apache.org/jira/browse/SPARK-9529 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Davies Liu Assignee: Davies Liu Priority: Critical Right now, it's really slow, just hang there in random tests {code} pool-1-thread-1-ScalaTest-running-TungstenSortSuite" prio=5 tid=0x7f822bc82800 nid=0x5103 runnable [0x00011d1be000] java.lang.Thread.State: RUNNABLE at java.math.BigInteger.(BigInteger.java:405) at java.math.BigDecimal.bigTenToThe(BigDecimal.java:3380) at java.math.BigDecimal.bigMultiplyPowerTen(BigDecimal.java:3508) at java.math.BigDecimal.setScale(BigDecimal.java:2394) at java.math.BigDecimal.divide(BigDecimal.java:1691) at java.math.BigDecimal.divideToIntegralValue(BigDecimal.java:1734) at java.math.BigDecimal.divideAndRemainder(BigDecimal.java:1891) at java.math.BigDecimal.remainder(BigDecimal.java:1833) at scala.math.BigDecimal.remainder(BigDecimal.scala:281) at scala.math.BigDecimal.isWhole(BigDecimal.scala:215) at scala.math.BigDecimal.hashCode(BigDecimal.scala:180) at org.apache.spark.sql.types.Decimal.hashCode(Decimal.scala:260) at org.apache.spark.sql.catalyst.InternalRow.hashCode(InternalRow.scala:121) at org.apache.spark.RangePartitioner.hashCode(Partitioner.scala:201) at java.lang.Object.toString(Object.java:237) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1418) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at org.apache.spark.SparkContext.clean(SparkContext.scala:2003) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:683) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:682) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:682) at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:181) at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:148) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:148) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112) at org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48) at org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) at org.apache.spark.sql.execution.Sort.doExecute(sort.scala:47) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlanTest$.executePlan(SparkPlanTest.scala:297) at org.apache.spark.sql.execution.SparkPlanTest$.checkAnswer(SparkPlanTest.scala:16
[jira] [Assigned] (SPARK-9483) UTF8String.getPrefix only works in little-endian order
[ https://issues.apache.org/jira/browse/SPARK-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9483: --- Assignee: Matthew Brandyberry (was: Apache Spark) > UTF8String.getPrefix only works in little-endian order > -- > > Key: SPARK-9483 > URL: https://issues.apache.org/jira/browse/SPARK-9483 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Matthew Brandyberry >Priority: Critical > > There are 2 bit masking and a reverse bytes that should probably be handled > differently on big-endian order. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9495) Support prefix generation for date / timestamp data type
[ https://issues.apache.org/jira/browse/SPARK-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9495: --- Assignee: Apache Spark > Support prefix generation for date / timestamp data type > > > Key: SPARK-9495 > URL: https://issues.apache.org/jira/browse/SPARK-9495 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > > There are two files to change: > SortPrefixUtils > and > SortPrefix (in SortOrder.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9483) UTF8String.getPrefix only works in little-endian order
[ https://issues.apache.org/jira/browse/SPARK-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650522#comment-14650522 ] Apache Spark commented on SPARK-9483: - User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/7856 > UTF8String.getPrefix only works in little-endian order > -- > > Key: SPARK-9483 > URL: https://issues.apache.org/jira/browse/SPARK-9483 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Matthew Brandyberry >Priority: Critical > > There are 2 bit masking and a reverse bytes that should probably be handled > differently on big-endian order. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9495) Support prefix generation for date / timestamp data type
[ https://issues.apache.org/jira/browse/SPARK-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650523#comment-14650523 ] Apache Spark commented on SPARK-9495: - User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/7856 > Support prefix generation for date / timestamp data type > > > Key: SPARK-9495 > URL: https://issues.apache.org/jira/browse/SPARK-9495 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > There are two files to change: > SortPrefixUtils > and > SortPrefix (in SortOrder.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9495) Support prefix generation for date / timestamp data type
[ https://issues.apache.org/jira/browse/SPARK-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9495: --- Assignee: (was: Apache Spark) > Support prefix generation for date / timestamp data type > > > Key: SPARK-9495 > URL: https://issues.apache.org/jira/browse/SPARK-9495 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > There are two files to change: > SortPrefixUtils > and > SortPrefix (in SortOrder.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9483) UTF8String.getPrefix only works in little-endian order
[ https://issues.apache.org/jira/browse/SPARK-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9483: --- Assignee: Apache Spark (was: Matthew Brandyberry) > UTF8String.getPrefix only works in little-endian order > -- > > Key: SPARK-9483 > URL: https://issues.apache.org/jira/browse/SPARK-9483 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark >Priority: Critical > > There are 2 bit masking and a reverse bytes that should probably be handled > differently on big-endian order. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9530) ScalaDoc should not indicate LDAModel.descripeTopic and DistributedLDAModel.topDocumentsPerTopic as approximate.
Meihua Wu created SPARK-9530: Summary: ScalaDoc should not indicate LDAModel.descripeTopic and DistributedLDAModel.topDocumentsPerTopic as approximate. Key: SPARK-9530 URL: https://issues.apache.org/jira/browse/SPARK-9530 Project: Spark Issue Type: Documentation Components: MLlib Affects Versions: 1.4.1, 1.4.0, 1.3.1, 1.3.0 Reporter: Meihua Wu Priority: Minor Currently the ScalaDoc for LDAModel.descripeTopic and DistributedLDAModel.topDocumentsPerTopic suggests that these methods are approximate. However, both methods are actually precise and there is no need to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise set of top terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9529) Improve sort on Decimal
[ https://issues.apache.org/jira/browse/SPARK-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9529: --- Assignee: Davies Liu (was: Apache Spark) > Improve sort on Decimal > --- > > Key: SPARK-9529 > URL: https://issues.apache.org/jira/browse/SPARK-9529 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu >Priority: Critical > > Right now, it's really slow, just hang there in random tests > {code} > pool-1-thread-1-ScalaTest-running-TungstenSortSuite" prio=5 > tid=0x7f822bc82800 nid=0x5103 runnable [0x00011d1be000] >java.lang.Thread.State: RUNNABLE > at java.math.BigInteger.(BigInteger.java:405) > at java.math.BigDecimal.bigTenToThe(BigDecimal.java:3380) > at java.math.BigDecimal.bigMultiplyPowerTen(BigDecimal.java:3508) > at java.math.BigDecimal.setScale(BigDecimal.java:2394) > at java.math.BigDecimal.divide(BigDecimal.java:1691) > at java.math.BigDecimal.divideToIntegralValue(BigDecimal.java:1734) > at java.math.BigDecimal.divideAndRemainder(BigDecimal.java:1891) > at java.math.BigDecimal.remainder(BigDecimal.java:1833) > at scala.math.BigDecimal.remainder(BigDecimal.scala:281) > at scala.math.BigDecimal.isWhole(BigDecimal.scala:215) > at scala.math.BigDecimal.hashCode(BigDecimal.scala:180) > at org.apache.spark.sql.types.Decimal.hashCode(Decimal.scala:260) > at > org.apache.spark.sql.catalyst.InternalRow.hashCode(InternalRow.scala:121) > at org.apache.spark.RangePartitioner.hashCode(Partitioner.scala:201) > at java.lang.Object.toString(Object.java:237) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1418) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84) > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2003) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:683) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:682) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) > at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:682) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:181) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:148) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) > at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:148) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112) > at > org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48) > at > org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) > at org.apache.spark.sql.execution.Sort.doExecute(sort.scala:47) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112) > at > org.a
[jira] [Commented] (SPARK-9529) Improve sort on Decimal
[ https://issues.apache.org/jira/browse/SPARK-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650540#comment-14650540 ] Apache Spark commented on SPARK-9529: - User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/7857 > Improve sort on Decimal > --- > > Key: SPARK-9529 > URL: https://issues.apache.org/jira/browse/SPARK-9529 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu >Priority: Critical > > Right now, it's really slow, just hang there in random tests > {code} > pool-1-thread-1-ScalaTest-running-TungstenSortSuite" prio=5 > tid=0x7f822bc82800 nid=0x5103 runnable [0x00011d1be000] >java.lang.Thread.State: RUNNABLE > at java.math.BigInteger.(BigInteger.java:405) > at java.math.BigDecimal.bigTenToThe(BigDecimal.java:3380) > at java.math.BigDecimal.bigMultiplyPowerTen(BigDecimal.java:3508) > at java.math.BigDecimal.setScale(BigDecimal.java:2394) > at java.math.BigDecimal.divide(BigDecimal.java:1691) > at java.math.BigDecimal.divideToIntegralValue(BigDecimal.java:1734) > at java.math.BigDecimal.divideAndRemainder(BigDecimal.java:1891) > at java.math.BigDecimal.remainder(BigDecimal.java:1833) > at scala.math.BigDecimal.remainder(BigDecimal.scala:281) > at scala.math.BigDecimal.isWhole(BigDecimal.scala:215) > at scala.math.BigDecimal.hashCode(BigDecimal.scala:180) > at org.apache.spark.sql.types.Decimal.hashCode(Decimal.scala:260) > at > org.apache.spark.sql.catalyst.InternalRow.hashCode(InternalRow.scala:121) > at org.apache.spark.RangePartitioner.hashCode(Partitioner.scala:201) > at java.lang.Object.toString(Object.java:237) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1418) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84) > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2003) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:683) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:682) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) > at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:682) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:181) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:148) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) > at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:148) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112) > at > org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48) > at > org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) > at org.apache.spark.sql.execution.Sort.doExecute(sort.scala:47) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:1
[jira] [Updated] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.
[ https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Meihua Wu updated SPARK-9530: - Summary: ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate. (was: ScalaDoc should not indicate LDAModel.descripeTopic and DistributedLDAModel.topDocumentsPerTopic as approximate.) > ScalaDoc should not indicate LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic as approximate. > - > > Key: SPARK-9530 > URL: https://issues.apache.org/jira/browse/SPARK-9530 > Project: Spark > Issue Type: Documentation > Components: MLlib >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1 >Reporter: Meihua Wu >Priority: Minor > > Currently the ScalaDoc for LDAModel.descripeTopic and > DistributedLDAModel.topDocumentsPerTopic suggests that these methods are > approximate. However, both methods are actually precise and there is no need > to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise > set of top terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9529) Improve sort on Decimal
[ https://issues.apache.org/jira/browse/SPARK-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9529: --- Assignee: Apache Spark (was: Davies Liu) > Improve sort on Decimal > --- > > Key: SPARK-9529 > URL: https://issues.apache.org/jira/browse/SPARK-9529 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Davies Liu >Assignee: Apache Spark >Priority: Critical > > Right now, it's really slow, just hang there in random tests > {code} > pool-1-thread-1-ScalaTest-running-TungstenSortSuite" prio=5 > tid=0x7f822bc82800 nid=0x5103 runnable [0x00011d1be000] >java.lang.Thread.State: RUNNABLE > at java.math.BigInteger.(BigInteger.java:405) > at java.math.BigDecimal.bigTenToThe(BigDecimal.java:3380) > at java.math.BigDecimal.bigMultiplyPowerTen(BigDecimal.java:3508) > at java.math.BigDecimal.setScale(BigDecimal.java:2394) > at java.math.BigDecimal.divide(BigDecimal.java:1691) > at java.math.BigDecimal.divideToIntegralValue(BigDecimal.java:1734) > at java.math.BigDecimal.divideAndRemainder(BigDecimal.java:1891) > at java.math.BigDecimal.remainder(BigDecimal.java:1833) > at scala.math.BigDecimal.remainder(BigDecimal.scala:281) > at scala.math.BigDecimal.isWhole(BigDecimal.scala:215) > at scala.math.BigDecimal.hashCode(BigDecimal.scala:180) > at org.apache.spark.sql.types.Decimal.hashCode(Decimal.scala:260) > at > org.apache.spark.sql.catalyst.InternalRow.hashCode(InternalRow.scala:121) > at org.apache.spark.RangePartitioner.hashCode(Partitioner.scala:201) > at java.lang.Object.toString(Object.java:237) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1418) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84) > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2003) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:683) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:682) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) > at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:682) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:181) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:148) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) > at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:148) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112) > at > org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48) > at > org.apache.spark.sql.execution.Sort$$anonfun$doExecute$1.apply(sort.scala:48) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) > at org.apache.spark.sql.execution.Sort.doExecute(sort.scala:47) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:113) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112) > at > org
[jira] [Updated] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.
[ https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Meihua Wu updated SPARK-9530: - Description: Currently the ScalaDoc for LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic suggests that these methods are approximate. However, both methods are actually precise and there is no need to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise set of top terms. was: Currently the ScalaDoc for LDAModel.descripeTopic and DistributedLDAModel.topDocumentsPerTopic suggests that these methods are approximate. However, both methods are actually precise and there is no need to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise set of top terms. > ScalaDoc should not indicate LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic as approximate. > - > > Key: SPARK-9530 > URL: https://issues.apache.org/jira/browse/SPARK-9530 > Project: Spark > Issue Type: Documentation > Components: MLlib >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1 >Reporter: Meihua Wu >Priority: Minor > > Currently the ScalaDoc for LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic suggests that these methods are > approximate. However, both methods are actually precise and there is no need > to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise > set of top terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.
[ https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9530: --- Assignee: Apache Spark > ScalaDoc should not indicate LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic as approximate. > - > > Key: SPARK-9530 > URL: https://issues.apache.org/jira/browse/SPARK-9530 > Project: Spark > Issue Type: Documentation > Components: MLlib >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1 >Reporter: Meihua Wu >Assignee: Apache Spark >Priority: Minor > > Currently the ScalaDoc for LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic suggests that these methods are > approximate. However, both methods are actually precise and there is no need > to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise > set of top terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.
[ https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650543#comment-14650543 ] Apache Spark commented on SPARK-9530: - User 'rotationsymmetry' has created a pull request for this issue: https://github.com/apache/spark/pull/7858 > ScalaDoc should not indicate LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic as approximate. > - > > Key: SPARK-9530 > URL: https://issues.apache.org/jira/browse/SPARK-9530 > Project: Spark > Issue Type: Documentation > Components: MLlib >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1 >Reporter: Meihua Wu >Priority: Minor > > Currently the ScalaDoc for LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic suggests that these methods are > approximate. However, both methods are actually precise and there is no need > to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise > set of top terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.
[ https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9530: --- Assignee: (was: Apache Spark) > ScalaDoc should not indicate LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic as approximate. > - > > Key: SPARK-9530 > URL: https://issues.apache.org/jira/browse/SPARK-9530 > Project: Spark > Issue Type: Documentation > Components: MLlib >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1 >Reporter: Meihua Wu >Priority: Minor > > Currently the ScalaDoc for LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic suggests that these methods are > approximate. However, both methods are actually precise and there is no need > to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise > set of top terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.
[ https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9530: - Assignee: Meihua Wu Target Version/s: 1.3.2, 1.4.2, 1.5.0 > ScalaDoc should not indicate LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic as approximate. > - > > Key: SPARK-9530 > URL: https://issues.apache.org/jira/browse/SPARK-9530 > Project: Spark > Issue Type: Documentation > Components: MLlib >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1 >Reporter: Meihua Wu >Assignee: Meihua Wu >Priority: Minor > > Currently the ScalaDoc for LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic suggests that these methods are > approximate. However, both methods are actually precise and there is no need > to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise > set of top terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-9492) LogisticRegression should provide model statistics
[ https://issues.apache.org/jira/browse/SPARK-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley closed SPARK-9492. Resolution: Duplicate > LogisticRegression should provide model statistics > -- > > Key: SPARK-9492 > URL: https://issues.apache.org/jira/browse/SPARK-9492 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Eric Liang > > Like ml LinearRegression, LogisticRegression should provide a training > summary including feature names and their coefficients. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9492) LogisticRegression in R should provide model statistics
[ https://issues.apache.org/jira/browse/SPARK-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9492: - Summary: LogisticRegression in R should provide model statistics (was: LogisticRegression should provide model statistics) > LogisticRegression in R should provide model statistics > --- > > Key: SPARK-9492 > URL: https://issues.apache.org/jira/browse/SPARK-9492 > Project: Spark > Issue Type: Improvement > Components: ML, R >Reporter: Eric Liang > > Like ml LinearRegression, LogisticRegression should provide a training > summary including feature names and their coefficients. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9492) LogisticRegression in R should provide model statistics
[ https://issues.apache.org/jira/browse/SPARK-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9492: - Component/s: R > LogisticRegression in R should provide model statistics > --- > > Key: SPARK-9492 > URL: https://issues.apache.org/jira/browse/SPARK-9492 > Project: Spark > Issue Type: Improvement > Components: ML, R >Reporter: Eric Liang > > Like ml LinearRegression, LogisticRegression should provide a training > summary including feature names and their coefficients. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-9492) LogisticRegression should provide model statistics
[ https://issues.apache.org/jira/browse/SPARK-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley reopened SPARK-9492: -- Oops, I just realized this was for Spark R. I'll add those tags. > LogisticRegression should provide model statistics > -- > > Key: SPARK-9492 > URL: https://issues.apache.org/jira/browse/SPARK-9492 > Project: Spark > Issue Type: Improvement > Components: ML, R >Reporter: Eric Liang > > Like ml LinearRegression, LogisticRegression should provide a training > summary including feature names and their coefficients. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5754) Spark AM not launching on Windows
[ https://issues.apache.org/jira/browse/SPARK-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650551#comment-14650551 ] Inigo Goiri commented on SPARK-5754: I overwrote the existing escapeForShell to use single quotes instead of double and I removed the "-XX:OnOutOfMemoryError='kill %p'" part in the command. This is just an internal solution for me but ideally this should check the OS and so on. > Spark AM not launching on Windows > - > > Key: SPARK-5754 > URL: https://issues.apache.org/jira/browse/SPARK-5754 > Project: Spark > Issue Type: Bug > Components: Windows, YARN >Affects Versions: 1.1.1, 1.2.0 > Environment: Windows Server 2012, Hadoop 2.4.1. >Reporter: Inigo > > I'm trying to run Spark Pi on a YARN cluster running on Windows and the AM > container fails to start. The problem seems to be in the generation of the > YARN command which adds single quotes (') surrounding some of the java > options. In particular, the part of the code that is adding those is the > escapeForShell function in YarnSparkHadoopUtil. Apparently, Windows does not > like the quotes for these options. Here is an example of the command that the > container tries to execute: > @call %JAVA_HOME%/bin/java -server -Xmx512m -Djava.io.tmpdir=%PWD%/tmp > '-Dspark.yarn.secondary.jars=' > '-Dspark.app.name=org.apache.spark.examples.SparkPi' > '-Dspark.master=yarn-cluster' org.apache.spark.deploy.yarn.ApplicationMaster > --class 'org.apache.spark.examples.SparkPi' --jar > 'file:/D:/data/spark-1.1.1-bin-hadoop2.4/bin/../lib/spark-examples-1.1.1-hadoop2.4.0.jar' > --executor-memory 1024 --executor-cores 1 --num-executors 2 > Once I transform it into: > @call %JAVA_HOME%/bin/java -server -Xmx512m -Djava.io.tmpdir=%PWD%/tmp > -Dspark.yarn.secondary.jars= > -Dspark.app.name=org.apache.spark.examples.SparkPi > -Dspark.master=yarn-cluster org.apache.spark.deploy.yarn.ApplicationMaster > --class 'org.apache.spark.examples.SparkPi' --jar > 'file:/D:/data/spark-1.1.1-bin-hadoop2.4/bin/../lib/spark-examples-1.1.1-hadoop2.4.0.jar' > --executor-memory 1024 --executor-cores 1 --num-executors 2 > Everything seems to start. > How should I deal with this? Creating a separate function like escapeForShell > for Windows and call it whenever I detect this is for Windows? Or should I > add some sanity check on YARN? > I checked a little and there seems to be people that is able to run Spark on > YARN on Windows, so it might be something else. I didn't find anything > related on Jira either. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8333) Spark failed to delete temp directory created by HiveContext
[ https://issues.apache.org/jira/browse/SPARK-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650555#comment-14650555 ] Sudhakar Thota commented on SPARK-8333: --- Thanks for the clarification. I have used the same statement you suggested to create HiveContext and was able to stop the sc without issues. After stopping without issues, I have validated by trying to use sqlContext as well as SparkContext. Please let me know if this is happening if you run it using a script and not from REPL. Please take a look. - 1. Creating HiveContext using hive, creating table, calling “sc.stop()” , verifying by trying to create a table again. Sudhakars-MacBook-Pro-2:spark-1.4.0 sudhakarthota$ bin/spark-shell Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.4.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc. SQL context available as sqlContext. scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@5ac35b17 scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS test1 (name STRING, rank INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'") res0: org.apache.spark.sql.DataFrame = [result: string] scala> sc.stop() scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS test2 (name STRING, rank INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'") java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext at org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:103) at org.apache.spark.SparkContext$$anonfun$parallelize$1.apply(SparkContext.scala:696) at org.apache.spark.SparkContext$$anonfun$parallelize$1.apply(SparkContext.scala:695) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) at org.apache.spark.SparkContext.withScope(SparkContext.scala:681) at org.apache.spark.SparkContext.parallelize(SparkContext.scala:695) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:939) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:939) at org.apache.spark.sql.DataFrame.(DataFrame.scala:144) at org.apache.spark.sql.DataFrame.(DataFrame.scala:128) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:744) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:24) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:29) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31) at $iwC$$iwC$$iwC$$iwC$$iwC.(:33) at $iwC$$iwC$$iwC$$iwC.(:35) at $iwC$$iwC$$iwC.(:37) at $iwC$$iwC.(:39) at $iwC.(:41) at (:43) at .(:47) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoo
[jira] [Assigned] (SPARK-9528) RandomForestClassifier should extend ProbabilisticClassifier
[ https://issues.apache.org/jira/browse/SPARK-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9528: --- Assignee: Apache Spark (was: Joseph K. Bradley) > RandomForestClassifier should extend ProbabilisticClassifier > > > Key: SPARK-9528 > URL: https://issues.apache.org/jira/browse/SPARK-9528 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Joseph K. Bradley >Assignee: Apache Spark > > Now that DecisionTreeClassifier extends ProbabilisticClassifier, we can have > RandomForestClassifier extends ProbabilisticClassifier as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9528) RandomForestClassifier should extend ProbabilisticClassifier
[ https://issues.apache.org/jira/browse/SPARK-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650558#comment-14650558 ] Apache Spark commented on SPARK-9528: - User 'jkbradley' has created a pull request for this issue: https://github.com/apache/spark/pull/7859 > RandomForestClassifier should extend ProbabilisticClassifier > > > Key: SPARK-9528 > URL: https://issues.apache.org/jira/browse/SPARK-9528 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley > > Now that DecisionTreeClassifier extends ProbabilisticClassifier, we can have > RandomForestClassifier extends ProbabilisticClassifier as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9528) RandomForestClassifier should extend ProbabilisticClassifier
[ https://issues.apache.org/jira/browse/SPARK-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9528: --- Assignee: Joseph K. Bradley (was: Apache Spark) > RandomForestClassifier should extend ProbabilisticClassifier > > > Key: SPARK-9528 > URL: https://issues.apache.org/jira/browse/SPARK-9528 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley > > Now that DecisionTreeClassifier extends ProbabilisticClassifier, we can have > RandomForestClassifier extends ProbabilisticClassifier as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.
[ https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9530: - Target Version/s: 1.5.0 (was: 1.3.2, 1.4.2, 1.5.0) > ScalaDoc should not indicate LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic as approximate. > - > > Key: SPARK-9530 > URL: https://issues.apache.org/jira/browse/SPARK-9530 > Project: Spark > Issue Type: Documentation > Components: MLlib >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1 >Reporter: Meihua Wu >Assignee: Meihua Wu >Priority: Minor > > Currently the ScalaDoc for LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic suggests that these methods are > approximate. However, both methods are actually precise and there is no need > to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise > set of top terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.
[ https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-9530. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7858 [https://github.com/apache/spark/pull/7858] > ScalaDoc should not indicate LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic as approximate. > - > > Key: SPARK-9530 > URL: https://issues.apache.org/jira/browse/SPARK-9530 > Project: Spark > Issue Type: Documentation > Components: MLlib >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1 >Reporter: Meihua Wu >Assignee: Meihua Wu >Priority: Minor > Fix For: 1.5.0 > > > Currently the ScalaDoc for LDAModel.describeTopics and > DistributedLDAModel.topDocumentsPerTopic suggests that these methods are > approximate. However, both methods are actually precise and there is no need > to increase maxTermsPerTopic or maxDocumentsPerTopic to get a more precise > set of top terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9447) Update python API to include RandomForest as classifier changes.
[ https://issues.apache.org/jira/browse/SPARK-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650561#comment-14650561 ] Joseph K. Bradley commented on SPARK-9447: -- I'll do this once [SPARK-9528] gets fixed. > Update python API to include RandomForest as classifier changes. > > > Key: SPARK-9447 > URL: https://issues.apache.org/jira/browse/SPARK-9447 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark >Reporter: holdenk > > The API should still work after > SPARK-9016-make-random-forest-classifiers-implement-classification-trait gets > merged in, but we might want to extend & provide predictRaw and similar in > the Python API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9531) UnsafeFixedWidthAggregationMap should be able to turn itself into an UnsafeKVExternalSorter
Reynold Xin created SPARK-9531: -- Summary: UnsafeFixedWidthAggregationMap should be able to turn itself into an UnsafeKVExternalSorter Key: SPARK-9531 URL: https://issues.apache.org/jira/browse/SPARK-9531 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9531) UnsafeFixedWidthAggregationMap should be able to turn itself into an UnsafeKVExternalSorter
[ https://issues.apache.org/jira/browse/SPARK-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650562#comment-14650562 ] Apache Spark commented on SPARK-9531: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/7860 > UnsafeFixedWidthAggregationMap should be able to turn itself into an > UnsafeKVExternalSorter > --- > > Key: SPARK-9531 > URL: https://issues.apache.org/jira/browse/SPARK-9531 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9531) UnsafeFixedWidthAggregationMap should be able to turn itself into an UnsafeKVExternalSorter
[ https://issues.apache.org/jira/browse/SPARK-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9531: --- Assignee: Apache Spark (was: Reynold Xin) > UnsafeFixedWidthAggregationMap should be able to turn itself into an > UnsafeKVExternalSorter > --- > > Key: SPARK-9531 > URL: https://issues.apache.org/jira/browse/SPARK-9531 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org