[jira] [Assigned] (SPARK-11912) ml.feature.PCA minor refactor
[ https://issues.apache.org/jira/browse/SPARK-11912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11912: Assignee: Apache Spark > ml.feature.PCA minor refactor > - > > Key: SPARK-11912 > URL: https://issues.apache.org/jira/browse/SPARK-11912 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Assignee: Apache Spark >Priority: Minor > > Like SPARK-11852, k is params and we should save it under metadata/ rather > than both under data/ and metadata/. We construct mllib.feature.PCAModel > inside transform. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11912) ml.feature.PCA minor refactor
[ https://issues.apache.org/jira/browse/SPARK-11912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11912: Assignee: (was: Apache Spark) > ml.feature.PCA minor refactor > - > > Key: SPARK-11912 > URL: https://issues.apache.org/jira/browse/SPARK-11912 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Priority: Minor > > Like SPARK-11852, k is params and we should save it under metadata/ rather > than both under data/ and metadata/. We construct mllib.feature.PCAModel > inside transform. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11912) ml.feature.PCA minor refactor
[ https://issues.apache.org/jira/browse/SPARK-11912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021428#comment-15021428 ] Apache Spark commented on SPARK-11912: -- User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/9897 > ml.feature.PCA minor refactor > - > > Key: SPARK-11912 > URL: https://issues.apache.org/jira/browse/SPARK-11912 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Priority: Minor > > Like SPARK-11852, k is params and we should save it under metadata/ rather > than both under data/ and metadata/. We construct mllib.feature.PCAModel > inside transform. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11757) Incorrect join output for joining two dataframes loaded from Parquet format
[ https://issues.apache.org/jira/browse/SPARK-11757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021534#comment-15021534 ] Jeff Zhang commented on SPARK-11757: I tried it on master, seems this issue has been resolved. > Incorrect join output for joining two dataframes loaded from Parquet format > --- > > Key: SPARK-11757 > URL: https://issues.apache.org/jira/browse/SPARK-11757 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.5.0 > Environment: Python 2.7, Spark 1.5.0, Amazon linux ami > https://aws.amazon.com/amazon-linux-ami/2015.03-release-notes/ >Reporter: Petri Kärkäs > Labels: dataframe, emr, join, pyspark > > Reading in dataframes from Parquet format in s3, and executing a join between > them fails when evoked by column name. Works correctly if a join condition is > used instead: > {code:none} > sqlContext = SQLContext(sc) > a = sqlContext.read.parquet('s3://path-to-data-a/') > b = sqlContext.read.parquet('s3://path-to-data-b/') > # result 0 rows > c = a.join(b, on='id', how='left_outer') > c.count() > # correct output > d = a.join(b, a['id']==b['id'], how='left_outer') > d.count() > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11895) Rename and possibly update DatasetExample in mllib/examples
[ https://issues.apache.org/jira/browse/SPARK-11895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-11895. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 9873 [https://github.com/apache/spark/pull/9873] > Rename and possibly update DatasetExample in mllib/examples > --- > > Key: SPARK-11895 > URL: https://issues.apache.org/jira/browse/SPARK-11895 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Minor > Fix For: 1.6.0 > > > We used the name `Dataset` to refer to `SchemaRDD` in 1.2 in ML pipelines and > created this example file. Since `Dataset` has a new meaning in Spark 1.6, we > should rename it to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11902) Unhandled case in VectorAssembler#transform
[ https://issues.apache.org/jira/browse/SPARK-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-11902: -- Assignee: Benjamin Fradet > Unhandled case in VectorAssembler#transform > --- > > Key: SPARK-11902 > URL: https://issues.apache.org/jira/browse/SPARK-11902 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.5.2 >Reporter: Benjamin Fradet >Assignee: Benjamin Fradet >Priority: Trivial > Fix For: 1.6.0 > > > I noticed that there is an unhandled case in the transform method of > VectorAssembler if one of the input columns doesn't have one of the supported > type DoubleType, NumericType, BooleanType or VectorUDT. > So, if you try to transform a column of StringType you get a cryptic > "scala.MatchError: StringType". > Will submit a PR shortly -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11902) Unhandled case in VectorAssembler#transform
[ https://issues.apache.org/jira/browse/SPARK-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-11902. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 9885 [https://github.com/apache/spark/pull/9885] > Unhandled case in VectorAssembler#transform > --- > > Key: SPARK-11902 > URL: https://issues.apache.org/jira/browse/SPARK-11902 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.5.2 >Reporter: Benjamin Fradet >Priority: Trivial > Fix For: 1.6.0 > > > I noticed that there is an unhandled case in the transform method of > VectorAssembler if one of the input columns doesn't have one of the supported > type DoubleType, NumericType, BooleanType or VectorUDT. > So, if you try to transform a column of StringType you get a cryptic > "scala.MatchError: StringType". > Will submit a PR shortly -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11902) Unhandled case in VectorAssembler#transform
[ https://issues.apache.org/jira/browse/SPARK-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-11902: -- Target Version/s: 1.6.0 > Unhandled case in VectorAssembler#transform > --- > > Key: SPARK-11902 > URL: https://issues.apache.org/jira/browse/SPARK-11902 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.5.2 >Reporter: Benjamin Fradet >Assignee: Benjamin Fradet >Priority: Trivial > Fix For: 1.6.0 > > > I noticed that there is an unhandled case in the transform method of > VectorAssembler if one of the input columns doesn't have one of the supported > type DoubleType, NumericType, BooleanType or VectorUDT. > So, if you try to transform a column of StringType you get a cryptic > "scala.MatchError: StringType". > Will submit a PR shortly -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11917) Add SQLContext#dropTempTable to PySpark
[ https://issues.apache.org/jira/browse/SPARK-11917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11917: Assignee: Apache Spark > Add SQLContext#dropTempTable to PySpark > --- > > Key: SPARK-11917 > URL: https://issues.apache.org/jira/browse/SPARK-11917 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Jeff Zhang >Assignee: Apache Spark >Priority: Minor > > Seems there's no api to drop table in pyspark now -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11860) Invalid argument specification for registerFunction [Python]
[ https://issues.apache.org/jira/browse/SPARK-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021545#comment-15021545 ] Apache Spark commented on SPARK-11860: -- User 'zjffdu' has created a pull request for this issue: https://github.com/apache/spark/pull/9901 > Invalid argument specification for registerFunction [Python] > > > Key: SPARK-11860 > URL: https://issues.apache.org/jira/browse/SPARK-11860 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 1.5.2 >Reporter: Tristan >Priority: Minor > Original Estimate: 5m > Remaining Estimate: 5m > > https://github.com/apache/spark/blob/branch-1.5/python/pyspark/sql/context.py#L171-L178 > Documentation for SQLContext.registerFunction specifies a lambda function as > input. This is false (it works fine with non-lambda functions). I believe > this is a typo based on the presence of 'samplingRatio' in the parameter docs: > https://github.com/apache/spark/blob/branch-1.5/python/pyspark/sql/context.py#L178 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11860) Invalid argument specification for registerFunction [Python]
[ https://issues.apache.org/jira/browse/SPARK-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11860: Assignee: (was: Apache Spark) > Invalid argument specification for registerFunction [Python] > > > Key: SPARK-11860 > URL: https://issues.apache.org/jira/browse/SPARK-11860 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 1.5.2 >Reporter: Tristan >Priority: Minor > Original Estimate: 5m > Remaining Estimate: 5m > > https://github.com/apache/spark/blob/branch-1.5/python/pyspark/sql/context.py#L171-L178 > Documentation for SQLContext.registerFunction specifies a lambda function as > input. This is false (it works fine with non-lambda functions). I believe > this is a typo based on the presence of 'samplingRatio' in the parameter docs: > https://github.com/apache/spark/blob/branch-1.5/python/pyspark/sql/context.py#L178 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11860) Invalid argument specification for registerFunction [Python]
[ https://issues.apache.org/jira/browse/SPARK-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11860: Assignee: Apache Spark > Invalid argument specification for registerFunction [Python] > > > Key: SPARK-11860 > URL: https://issues.apache.org/jira/browse/SPARK-11860 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 1.5.2 >Reporter: Tristan >Assignee: Apache Spark >Priority: Minor > Original Estimate: 5m > Remaining Estimate: 5m > > https://github.com/apache/spark/blob/branch-1.5/python/pyspark/sql/context.py#L171-L178 > Documentation for SQLContext.registerFunction specifies a lambda function as > input. This is false (it works fine with non-lambda functions). I believe > this is a typo based on the presence of 'samplingRatio' in the parameter docs: > https://github.com/apache/spark/blob/branch-1.5/python/pyspark/sql/context.py#L178 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6791) Model export/import for spark.ml: CrossValidator
[ https://issues.apache.org/jira/browse/SPARK-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-6791. -- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 9848 [https://github.com/apache/spark/pull/9848] > Model export/import for spark.ml: CrossValidator > > > Key: SPARK-6791 > URL: https://issues.apache.org/jira/browse/SPARK-6791 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley > Fix For: 1.6.0 > > > Updated to be for CrossValidator only -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11912) ml.feature.PCA minor refactor
[ https://issues.apache.org/jira/browse/SPARK-11912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-11912: -- Target Version/s: 1.6.0 > ml.feature.PCA minor refactor > - > > Key: SPARK-11912 > URL: https://issues.apache.org/jira/browse/SPARK-11912 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Assignee: Yanbo Liang >Priority: Minor > Fix For: 1.6.0 > > > Like SPARK-11852, k is params and we should save it under metadata/ rather > than both under data/ and metadata/. Refactor the constructor of > ml.feature.PCAModel to take only pc but construct mllib.feature.PCAModel > inside transform. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11912) ml.feature.PCA minor refactor
[ https://issues.apache.org/jira/browse/SPARK-11912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-11912: -- Assignee: Yanbo Liang > ml.feature.PCA minor refactor > - > > Key: SPARK-11912 > URL: https://issues.apache.org/jira/browse/SPARK-11912 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Assignee: Yanbo Liang >Priority: Minor > Fix For: 1.6.0 > > > Like SPARK-11852, k is params and we should save it under metadata/ rather > than both under data/ and metadata/. Refactor the constructor of > ml.feature.PCAModel to take only pc but construct mllib.feature.PCAModel > inside transform. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11912) ml.feature.PCA minor refactor
[ https://issues.apache.org/jira/browse/SPARK-11912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-11912. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 9897 [https://github.com/apache/spark/pull/9897] > ml.feature.PCA minor refactor > - > > Key: SPARK-11912 > URL: https://issues.apache.org/jira/browse/SPARK-11912 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Priority: Minor > Fix For: 1.6.0 > > > Like SPARK-11852, k is params and we should save it under metadata/ rather > than both under data/ and metadata/. Refactor the constructor of > ml.feature.PCAModel to take only pc but construct mllib.feature.PCAModel > inside transform. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11916) Expression TRIM/LTRIM/RTRIM to support specific trim word
[ https://issues.apache.org/jira/browse/SPARK-11916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11916: Assignee: Apache Spark > Expression TRIM/LTRIM/RTRIM to support specific trim word > - > > Key: SPARK-11916 > URL: https://issues.apache.org/jira/browse/SPARK-11916 > Project: Spark > Issue Type: Improvement >Reporter: Adrian Wang >Assignee: Apache Spark >Priority: Minor > > supports expressions like `trim('xxxabcxxx', 'x')` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11916) Expression TRIM/LTRIM/RTRIM to support specific trim word
[ https://issues.apache.org/jira/browse/SPARK-11916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021606#comment-15021606 ] Apache Spark commented on SPARK-11916: -- User 'adrian-wang' has created a pull request for this issue: https://github.com/apache/spark/pull/9902 > Expression TRIM/LTRIM/RTRIM to support specific trim word > - > > Key: SPARK-11916 > URL: https://issues.apache.org/jira/browse/SPARK-11916 > Project: Spark > Issue Type: Improvement >Reporter: Adrian Wang >Priority: Minor > > supports expressions like `trim('xxxabcxxx', 'x')` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11916) Expression TRIM/LTRIM/RTRIM to support specific trim word
[ https://issues.apache.org/jira/browse/SPARK-11916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11916: Assignee: (was: Apache Spark) > Expression TRIM/LTRIM/RTRIM to support specific trim word > - > > Key: SPARK-11916 > URL: https://issues.apache.org/jira/browse/SPARK-11916 > Project: Spark > Issue Type: Improvement >Reporter: Adrian Wang >Priority: Minor > > supports expressions like `trim('xxxabcxxx', 'x')` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2336) Approximate k-NN Models for MLLib
[ https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021527#comment-15021527 ] Sen Fang commented on SPARK-2336: - I finally took a crack on the hybrid spill tree for kNN and results so far appear to be promising. For anyone who is still interested, you can find it as a spark package at: https://github.com/saurfang/spark-knn The implementation is written for ml API and scales well in terms of both number of observations and number of vector dimensions. The KNN itself is flexible and the package comes with KNNClassifier and KNNRegression for (optionally weighted) classification and regression. There are a few implementation details I am still trying to iron out. Otherwise I look forward to benchmark it against other implementations such as KNN-join, KD-Tree, and LSH. > Approximate k-NN Models for MLLib > - > > Key: SPARK-2336 > URL: https://issues.apache.org/jira/browse/SPARK-2336 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Brian Gawalt >Priority: Minor > Labels: clustering, features > > After tackling the general k-Nearest Neighbor model as per > https://issues.apache.org/jira/browse/SPARK-2335 , there's an opportunity to > also offer approximate k-Nearest Neighbor. A promising approach would involve > building a kd-tree variant within from each partition, a la > http://www.autonlab.org/autonweb/14714.html?branch=1=2 > This could offer a simple non-linear ML model that can label new data with > much lower latency than the plain-vanilla kNN versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11835) Add a menu to the documentation of MLlib
[ https://issues.apache.org/jira/browse/SPARK-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-11835. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 9826 [https://github.com/apache/spark/pull/9826] > Add a menu to the documentation of MLlib > > > Key: SPARK-11835 > URL: https://issues.apache.org/jira/browse/SPARK-11835 > Project: Spark > Issue Type: Improvement > Components: Documentation, MLlib >Affects Versions: 1.5.1 >Reporter: Tim Hunter >Assignee: Tim Hunter > Fix For: 1.6.0 > > Attachments: Screen Shot 2015-11-18 at 4.50.45 PM.png > > > Right now, the table of contents gets generated on a page-by-page basis, > which makes it hard to navigate between different topics in a project. We > should make use of the empty space on the left of the documentation to put a > navigation menu. > A picture is worth a thousand words: -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11917) Add SQLContext#dropTempTable to PySpark
[ https://issues.apache.org/jira/browse/SPARK-11917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11917: Assignee: (was: Apache Spark) > Add SQLContext#dropTempTable to PySpark > --- > > Key: SPARK-11917 > URL: https://issues.apache.org/jira/browse/SPARK-11917 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Jeff Zhang >Priority: Minor > > Seems there's no api to drop table in pyspark now -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11917) Add SQLContext#dropTempTable to PySpark
[ https://issues.apache.org/jira/browse/SPARK-11917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021625#comment-15021625 ] Apache Spark commented on SPARK-11917: -- User 'zjffdu' has created a pull request for this issue: https://github.com/apache/spark/pull/9903 > Add SQLContext#dropTempTable to PySpark > --- > > Key: SPARK-11917 > URL: https://issues.apache.org/jira/browse/SPARK-11917 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Jeff Zhang >Priority: Minor > > Seems there's no api to drop table in pyspark now -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11917) Add SQLContext#dropTempTable to PySpark
Jeff Zhang created SPARK-11917: -- Summary: Add SQLContext#dropTempTable to PySpark Key: SPARK-11917 URL: https://issues.apache.org/jira/browse/SPARK-11917 Project: Spark Issue Type: Improvement Components: PySpark Reporter: Jeff Zhang Priority: Minor Seems there's no api to drop table in pyspark now -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11894) Incorrect results are returned when using null
[ https://issues.apache.org/jira/browse/SPARK-11894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11894: Assignee: (was: Apache Spark) > Incorrect results are returned when using null > -- > > Key: SPARK-11894 > URL: https://issues.apache.org/jira/browse/SPARK-11894 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Xiao Li > > In DataSet APIs, the following two datasets are the same. > Seq((new java.lang.Integer(0), "1"), (new java.lang.Integer(22), > "2")).toDS() > Seq((null.asInstanceOf[java.lang.Integer],, "1"), (new > java.lang.Integer(22), "2")).toDS() > Note: java.lang.Integer is Nullable. > It could generate an incorrect result. For example, > val ds1 = Seq((null.asInstanceOf[java.lang.Integer], "1"), (new > java.lang.Integer(22), "2")).toDS() > val ds2 = Seq((null.asInstanceOf[java.lang.Integer], "1"), (new > java.lang.Integer(22), "2")).toDS()//toDF("key", "value").as('df2) > val res1 = ds1.joinWith(ds2, lit(true)).collect() > The expected result should be > ((null,1),(null,1)) > ((22,2),(null,1)) > ((null,1),(22,2)) > ((22,2),(22,2)) > The actual result is > ((0,1),(0,1)) > ((22,2),(0,1)) > ((0,1),(22,2)) > ((22,2),(22,2)) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11894) Incorrect results are returned when using null
[ https://issues.apache.org/jira/browse/SPARK-11894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021644#comment-15021644 ] Apache Spark commented on SPARK-11894: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/9904 > Incorrect results are returned when using null > -- > > Key: SPARK-11894 > URL: https://issues.apache.org/jira/browse/SPARK-11894 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Xiao Li > > In DataSet APIs, the following two datasets are the same. > Seq((new java.lang.Integer(0), "1"), (new java.lang.Integer(22), > "2")).toDS() > Seq((null.asInstanceOf[java.lang.Integer],, "1"), (new > java.lang.Integer(22), "2")).toDS() > Note: java.lang.Integer is Nullable. > It could generate an incorrect result. For example, > val ds1 = Seq((null.asInstanceOf[java.lang.Integer], "1"), (new > java.lang.Integer(22), "2")).toDS() > val ds2 = Seq((null.asInstanceOf[java.lang.Integer], "1"), (new > java.lang.Integer(22), "2")).toDS()//toDF("key", "value").as('df2) > val res1 = ds1.joinWith(ds2, lit(true)).collect() > The expected result should be > ((null,1),(null,1)) > ((22,2),(null,1)) > ((null,1),(22,2)) > ((22,2),(22,2)) > The actual result is > ((0,1),(0,1)) > ((22,2),(0,1)) > ((0,1),(22,2)) > ((22,2),(22,2)) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11894) Incorrect results are returned when using null
[ https://issues.apache.org/jira/browse/SPARK-11894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11894: Assignee: Apache Spark > Incorrect results are returned when using null > -- > > Key: SPARK-11894 > URL: https://issues.apache.org/jira/browse/SPARK-11894 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Xiao Li >Assignee: Apache Spark > > In DataSet APIs, the following two datasets are the same. > Seq((new java.lang.Integer(0), "1"), (new java.lang.Integer(22), > "2")).toDS() > Seq((null.asInstanceOf[java.lang.Integer],, "1"), (new > java.lang.Integer(22), "2")).toDS() > Note: java.lang.Integer is Nullable. > It could generate an incorrect result. For example, > val ds1 = Seq((null.asInstanceOf[java.lang.Integer], "1"), (new > java.lang.Integer(22), "2")).toDS() > val ds2 = Seq((null.asInstanceOf[java.lang.Integer], "1"), (new > java.lang.Integer(22), "2")).toDS()//toDF("key", "value").as('df2) > val res1 = ds1.joinWith(ds2, lit(true)).collect() > The expected result should be > ((null,1),(null,1)) > ((22,2),(null,1)) > ((null,1),(22,2)) > ((22,2),(22,2)) > The actual result is > ((0,1),(0,1)) > ((22,2),(0,1)) > ((0,1),(22,2)) > ((22,2),(22,2)) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11916) Expression TRIM/LTRIM/RTRIM to support specific trim word
Adrian Wang created SPARK-11916: --- Summary: Expression TRIM/LTRIM/RTRIM to support specific trim word Key: SPARK-11916 URL: https://issues.apache.org/jira/browse/SPARK-11916 Project: Spark Issue Type: Improvement Reporter: Adrian Wang Priority: Minor supports expressions like `trim('xxxabcxxx', 'x')` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11909) Spark Standalone's master URL accepts URLs without port (assuming default 7077)
[ https://issues.apache.org/jira/browse/SPARK-11909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021669#comment-15021669 ] Saisai Shao commented on SPARK-11909: - The master will print the master url in web UI and log. Since master is a daemon process, it is not so good to print in the console. Also as [~srowen] suggested, it is better for user to explicitly specify the port number, this port is also used to differ whether you're submitting Spark application using binary protocol (7077) or REST (6066), if it can be ignored, it is hard for Spark itself to decide which port is the right port you want to submit to. > Spark Standalone's master URL accepts URLs without port (assuming default > 7077) > --- > > Key: SPARK-11909 > URL: https://issues.apache.org/jira/browse/SPARK-11909 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Jacek Laskowski >Priority: Trivial > > It's currently impossible to use {{spark://localhost}} URL for Spark > Standalone's master. With the feature supported, it'd be less to know to get > started with the mode (and hence improve user friendliness). > I think no-port master URL should be supported and assume the default port > {{7077}}. > {code} > org.apache.spark.SparkException: Invalid master URL: spark://localhost > at > org.apache.spark.util.Utils$.extractHostPortFromSparkUrl(Utils.scala:2088) > at org.apache.spark.rpc.RpcAddress$.fromSparkURL(RpcAddress.scala:47) > at > org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) > at > org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at org.apache.spark.deploy.client.AppClient.(AppClient.scala:48) > at > org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.start(SparkDeploySchedulerBackend.scala:93) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) > at org.apache.spark.SparkContext.(SparkContext.scala:530) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11604) ML 1.6 QA: API: Python API coverage
[ https://issues.apache.org/jira/browse/SPARK-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-11604: Description: For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala & Python versions. We need to track: * Inconsistency: Do class/method/parameter names match? * Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc. * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release. ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well. * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python, to be added in the next release cycle. Please use a *separate* JIRA (linked below) for this list of to-do items. * Inconsistency: ** ml.classification SPARK-11815 SPARK-11820 * Docs: ** ml.classification SPARK-11875 was: For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala & Python versions. We need to track: * Inconsistency: Do class/method/parameter names match? * Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc. * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release. ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well. * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python, to be added in the next release cycle. Please use a *separate* JIRA (linked below) for this list of to-do items. * Inconsistency: ** ml.classification > ML 1.6 QA: API: Python API coverage > --- > > Key: SPARK-11604 > URL: https://issues.apache.org/jira/browse/SPARK-11604 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML, MLlib, PySpark >Reporter: Joseph K. Bradley >Assignee: Yanbo Liang > > For new public APIs added to MLlib, we need to check the generated HTML doc > and compare the Scala & Python versions. We need to track: > * Inconsistency: Do class/method/parameter names match? > * Docs: Is the Python doc missing or just a stub? We want the Python doc to > be as complete as the Scala doc. > * API breaking changes: These should be very rare but are occasionally either > necessary (intentional) or accidental. These must be recorded and added in > the Migration Guide for this release. > ** Note: If the API change is for an Alpha/Experimental/DeveloperApi > component, please note that as well. > * Missing classes/methods/parameters: We should create to-do JIRAs for > functionality missing from Python, to be added in the next release cycle. > Please use a *separate* JIRA (linked below) for this list of to-do items. > * Inconsistency: > ** ml.classification SPARK-11815 SPARK-11820 > * Docs: > ** ml.classification SPARK-11875 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11604) ML 1.6 QA: API: Python API coverage
[ https://issues.apache.org/jira/browse/SPARK-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-11604: Description: For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala & Python versions. We need to track: * Inconsistency: Do class/method/parameter names match? * Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc. * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release. ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well. * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python, to be added in the next release cycle. Please use a *separate* JIRA (linked below) for this list of to-do items. List the found issues: * Inconsistency: ** ml.classification SPARK-11815 SPARK-11820 * Docs: ** ml.classification SPARK-11875 was: For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala & Python versions. We need to track: * Inconsistency: Do class/method/parameter names match? * Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc. * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release. ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well. * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python, to be added in the next release cycle. Please use a *separate* JIRA (linked below) for this list of to-do items. * Inconsistency: ** ml.classification SPARK-11815 SPARK-11820 * Docs: ** ml.classification SPARK-11875 > ML 1.6 QA: API: Python API coverage > --- > > Key: SPARK-11604 > URL: https://issues.apache.org/jira/browse/SPARK-11604 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML, MLlib, PySpark >Reporter: Joseph K. Bradley >Assignee: Yanbo Liang > > For new public APIs added to MLlib, we need to check the generated HTML doc > and compare the Scala & Python versions. We need to track: > * Inconsistency: Do class/method/parameter names match? > * Docs: Is the Python doc missing or just a stub? We want the Python doc to > be as complete as the Scala doc. > * API breaking changes: These should be very rare but are occasionally either > necessary (intentional) or accidental. These must be recorded and added in > the Migration Guide for this release. > ** Note: If the API change is for an Alpha/Experimental/DeveloperApi > component, please note that as well. > * Missing classes/methods/parameters: We should create to-do JIRAs for > functionality missing from Python, to be added in the next release cycle. > Please use a *separate* JIRA (linked below) for this list of to-do items. > List the found issues: > * Inconsistency: > ** ml.classification SPARK-11815 SPARK-11820 > * Docs: > ** ml.classification SPARK-11875 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11912) ml.feature.PCA minor refactor
[ https://issues.apache.org/jira/browse/SPARK-11912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-11912: Description: Like SPARK-11852, k is params and we should save it under metadata/ rather than both under data/ and metadata/. Refactor the constructor of ml.feature.PCAModel to take only pc but construct mllib.feature.PCAModel inside transform (was: Like SPARK-11852, k is params and we should save it under metadata/ rather than both under data/ and metadata/. We construct mllib.feature.PCAModel inside transform.) > ml.feature.PCA minor refactor > - > > Key: SPARK-11912 > URL: https://issues.apache.org/jira/browse/SPARK-11912 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Priority: Minor > > Like SPARK-11852, k is params and we should save it under metadata/ rather > than both under data/ and metadata/. Refactor the constructor of > ml.feature.PCAModel to take only pc but construct mllib.feature.PCAModel > inside transform -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11912) ml.feature.PCA minor refactor
[ https://issues.apache.org/jira/browse/SPARK-11912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-11912: Description: Like SPARK-11852, k is params and we should save it under metadata/ rather than both under data/ and metadata/. Refactor the constructor of ml.feature.PCAModel to take only pc but construct mllib.feature.PCAModel inside transform. (was: Like SPARK-11852, k is params and we should save it under metadata/ rather than both under data/ and metadata/. Refactor the constructor of ml.feature.PCAModel to take only pc but construct mllib.feature.PCAModel inside transform) > ml.feature.PCA minor refactor > - > > Key: SPARK-11912 > URL: https://issues.apache.org/jira/browse/SPARK-11912 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Priority: Minor > > Like SPARK-11852, k is params and we should save it under metadata/ rather > than both under data/ and metadata/. Refactor the constructor of > ml.feature.PCAModel to take only pc but construct mllib.feature.PCAModel > inside transform. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11619) cannot use UDTF in DataFrame.selectExpr
[ https://issues.apache.org/jira/browse/SPARK-11619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021509#comment-15021509 ] Wenchen Fan commented on SPARK-11619: - Actually this line: https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L689 When we use `selectExpr`, we pass in `UnresolvedFunction` to `DataFrame.select` and fall in the last case. A workaround is to do special handling for UDTF like we did for `explode`(and `json_tuple` in 1.6), wrap it with `MultiAlias`. Another workaround is using `expr`, for example, `df.select(expr("explode(a)").as(Nil))`, I think `selectExpr` is no longer needed after we have the `expr` function > cannot use UDTF in DataFrame.selectExpr > --- > > Key: SPARK-11619 > URL: https://issues.apache.org/jira/browse/SPARK-11619 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Priority: Minor > > Currently if use UDTF like `explode`, `json_tuple` in `DataFrame.selectExpr`, > it will be parsed into `UnresolvedFunction` first, and then alias it with > `expr.prettyString`. However, UDTF may need MultiAlias so we will get error > if we run: > {code} > val df = Seq((Map("1" -> 1), 1)).toDF("a", "b") > df.selectExpr("explode(a)").show() > {code} > [info] org.apache.spark.sql.AnalysisException: Expect multiple names given > for org.apache.spark.sql.catalyst.expressions.Explode, > [info] but only single name ''explode(a)' specified; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11903) Deprecate make-distribution.sh --skip-java-test
[ https://issues.apache.org/jira/browse/SPARK-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021511#comment-15021511 ] Patrick Wendell commented on SPARK-11903: - I think it's simply dead code. SKIP_JAVA_TEST related to a check we did regarding whether Java 6 was being used instead of Java 7. It doesn't have anything to do with unit tests. Spark now requires Java 7, so the test has been removed, but the parser still handles that variable. It was just an omission not deleted as part of SPARK-7733 (https://github.com/apache/spark/commit/e84815dc333a69368a48e0152f02934980768a14) /cc [~srowen]. > Deprecate make-distribution.sh --skip-java-test > --- > > Key: SPARK-11903 > URL: https://issues.apache.org/jira/browse/SPARK-11903 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas >Priority: Minor > > The {{\-\-skip-java-test}} option to {{make-distribution.sh}} [does not > appear to be > used|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L72-L73], > and tests are [always > skipped|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L170]. > Searching the Spark codebase for {{SKIP_JAVA_TEST}} yields no results other > than [this > one|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L72-L73]. > If this option is not needed, we should deprecate and eventually remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11906) Speculation Tasks Cause ProgressBar UI Overflow
[ https://issues.apache.org/jira/browse/SPARK-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021228#comment-15021228 ] Apache Spark commented on SPARK-11906: -- User 'saurfang' has created a pull request for this issue: https://github.com/apache/spark/pull/9896 > Speculation Tasks Cause ProgressBar UI Overflow > --- > > Key: SPARK-11906 > URL: https://issues.apache.org/jira/browse/SPARK-11906 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Sen Fang >Priority: Trivial > > When there are speculative tasks in stage, the started tasks + completed > tasks can be greater than total number of tasks. It leads to the started > progress block to overflow to next line. Visually the light blue progress > block becomes no longer visible when it happens. > The fix should be as trivial as to cap the number of started task by total - > completed task. > https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L322 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11906) Speculation Tasks Cause ProgressBar UI Overflow
[ https://issues.apache.org/jira/browse/SPARK-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11906: Assignee: (was: Apache Spark) > Speculation Tasks Cause ProgressBar UI Overflow > --- > > Key: SPARK-11906 > URL: https://issues.apache.org/jira/browse/SPARK-11906 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Sen Fang >Priority: Trivial > > When there are speculative tasks in stage, the started tasks + completed > tasks can be greater than total number of tasks. It leads to the started > progress block to overflow to next line. Visually the light blue progress > block becomes no longer visible when it happens. > The fix should be as trivial as to cap the number of started task by total - > completed task. > https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L322 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11906) Speculation Tasks Cause ProgressBar UI Overflow
[ https://issues.apache.org/jira/browse/SPARK-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11906: Assignee: Apache Spark > Speculation Tasks Cause ProgressBar UI Overflow > --- > > Key: SPARK-11906 > URL: https://issues.apache.org/jira/browse/SPARK-11906 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Sen Fang >Assignee: Apache Spark >Priority: Trivial > > When there are speculative tasks in stage, the started tasks + completed > tasks can be greater than total number of tasks. It leads to the started > progress block to overflow to next line. Visually the light blue progress > block becomes no longer visible when it happens. > The fix should be as trivial as to cap the number of started task by total - > completed task. > https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L322 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11730) Feature Importance for GBT
[ https://issues.apache.org/jira/browse/SPARK-11730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021301#comment-15021301 ] Joseph K. Bradley commented on SPARK-11730: --- I wrote that note since I did not have time to research what people do for GBTs. I'd be Ok with matching sklearn's implementation, though it would be great if we could find academic work indicating a "right" way to handle GBTs. In particular, I am not sure if trees' contributions should be weighted differently (based on the learning process) or if they should just use the tree weights (resembling how prediction works). > Feature Importance for GBT > -- > > Key: SPARK-11730 > URL: https://issues.apache.org/jira/browse/SPARK-11730 > Project: Spark > Issue Type: New Feature > Components: ML, MLlib >Reporter: Brian Webb > > Random Forests have feature importance, but GBT do not. It would be great if > we can add feature importance to GBT as well. Perhaps the code in Random > Forests can be refactored to apply to both types of ensembles. > See https://issues.apache.org/jira/browse/SPARK-5133 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10129) math function: stddev_samp
[ https://issues.apache.org/jira/browse/SPARK-10129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021319#comment-15021319 ] Yin Huai commented on SPARK-10129: -- We have stddev_samp in agg functions. Should we resolve this? Or, it is stddev for a list of numbers? > math function: stddev_samp > -- > > Key: SPARK-10129 > URL: https://issues.apache.org/jira/browse/SPARK-10129 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Davies Liu > > Use the STDDEV_SAMP function to return the standard deviation of a sample > variance. > http://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.bigsql.doc/doc/bsql_stdev_samp.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10129) math function: stddev_samp
[ https://issues.apache.org/jira/browse/SPARK-10129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021319#comment-15021319 ] Yin Huai edited comment on SPARK-10129 at 11/23/15 1:16 AM: We have stddev_samp in agg functions. Should we resolve this? Or, it is stddev for a value of an array type? was (Author: yhuai): We have stddev_samp in agg functions. Should we resolve this? Or, it is stddev for a list of numbers? > math function: stddev_samp > -- > > Key: SPARK-10129 > URL: https://issues.apache.org/jira/browse/SPARK-10129 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Davies Liu > > Use the STDDEV_SAMP function to return the standard deviation of a sample > variance. > http://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.bigsql.doc/doc/bsql_stdev_samp.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9506) DataFrames Postgresql JDBC unable to support most of the Postgresql's Data Type
[ https://issues.apache.org/jira/browse/SPARK-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021499#comment-15021499 ] Wenchen Fan commented on SPARK-9506: I think it's not a workaround, but the right thing to do. We already have a `PostgreDialect` and we can add more support for non-standard sql types. > DataFrames Postgresql JDBC unable to support most of the Postgresql's Data > Type > --- > > Key: SPARK-9506 > URL: https://issues.apache.org/jira/browse/SPARK-9506 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Reporter: Pangjiu > Attachments: code.PNG, log.PNG, tables_structures.PNG > > > Hi All, > I have issue on using Postgresql JDBC with sqlContext for postgresql's data > types: eg: abstime, character varying[], int2vector, json and etc. > Exception are "Unsupported type 2003" and "Unsupported type ". > Below is the code: > Class.forName("org.postgresql.Driver").newInstance() > val url = "jdbc:postgresql://localhost:5432/sample?user=posgres=xxx" > val driver = "org.postgresql.Driver" > val output = { sqlContext.load("jdbc", Map > ( > "url" -> url, > "driver" -> driver, > "dbtable" -> "(SELECT `ID`, `NAME` FROM > `agent`) AS tableA " > ) > ) > } > Hope SQL Context can support all the data types. > Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-11903) Deprecate make-distribution.sh --skip-java-test
[ https://issues.apache.org/jira/browse/SPARK-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021511#comment-15021511 ] Patrick Wendell edited comment on SPARK-11903 at 11/23/15 4:29 AM: --- I think it's simply dead code that should be deleted. SKIP_JAVA_TEST related to a check we did regarding whether Java 6 was being used instead of Java 7. It doesn't have anything to do with unit tests. Spark now requires Java 7, so the test has been removed, but the parser still handles that variable. It was just an omission not deleted as part of SPARK-7733 (https://github.com/apache/spark/commit/e84815dc333a69368a48e0152f02934980768a14) /cc [~srowen]. was (Author: pwendell): I think it's simply dead code. SKIP_JAVA_TEST related to a check we did regarding whether Java 6 was being used instead of Java 7. It doesn't have anything to do with unit tests. Spark now requires Java 7, so the test has been removed, but the parser still handles that variable. It was just an omission not deleted as part of SPARK-7733 (https://github.com/apache/spark/commit/e84815dc333a69368a48e0152f02934980768a14) /cc [~srowen]. > Deprecate make-distribution.sh --skip-java-test > --- > > Key: SPARK-11903 > URL: https://issues.apache.org/jira/browse/SPARK-11903 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas >Priority: Minor > > The {{\-\-skip-java-test}} option to {{make-distribution.sh}} [does not > appear to be > used|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L72-L73], > and tests are [always > skipped|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L170]. > Searching the Spark codebase for {{SKIP_JAVA_TEST}} yields no results other > than [this > one|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L72-L73]. > If this option is not needed, we should deprecate and eventually remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11206) Support SQL UI on the history server
[ https://issues.apache.org/jira/browse/SPARK-11206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021363#comment-15021363 ] Carson Wang commented on SPARK-11206: - To support SQL UI on the history server: 1. I added an onOtherEvent method to the SparkListener trait and post all SQL related events to the same event bus. 2. Two SQL events SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd are defined in the sql module. 3. The new SQL events are written to event log using Jackson. 4. A new trait SparkHistoryListenerFactory is added to allow the history server to feed events to the SQL history listener. The SQL implementation is loaded at runtime using java.util.ServiceLoader. > Support SQL UI on the history server > > > Key: SPARK-11206 > URL: https://issues.apache.org/jira/browse/SPARK-11206 > Project: Spark > Issue Type: New Feature > Components: SQL, Web UI >Reporter: Carson Wang > > On the live web UI, there is a SQL tab which provides valuable information > for the SQL query. But once the workload is finished, we won't see the SQL > tab on the history server. It will be helpful if we support SQL UI on the > history server so we can analyze it even after its execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11915) Fix flaky python test pyspark.sql.group
[ https://issues.apache.org/jira/browse/SPARK-11915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-11915: Description: The python test pyspark.sql.group will fail due to items' order in returned array. We should sort the aggregation results to make the test stable. was: The python test pyspark.sql.group fails due to items' order in returned array. We should fix it. > Fix flaky python test pyspark.sql.group > --- > > Key: SPARK-11915 > URL: https://issues.apache.org/jira/browse/SPARK-11915 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Reporter: Liang-Chi Hsieh > > The python test pyspark.sql.group will fail due to items' order in returned > array. We should sort the aggregation results to make the test stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11915) Fix flaky python test pyspark.sql.group
[ https://issues.apache.org/jira/browse/SPARK-11915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11915: Assignee: Apache Spark > Fix flaky python test pyspark.sql.group > --- > > Key: SPARK-11915 > URL: https://issues.apache.org/jira/browse/SPARK-11915 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > The python test pyspark.sql.group will fail due to items' order in returned > array. We should sort the aggregation results to make the test stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11915) Fix flaky python test pyspark.sql.group
[ https://issues.apache.org/jira/browse/SPARK-11915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021498#comment-15021498 ] Apache Spark commented on SPARK-11915: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/9900 > Fix flaky python test pyspark.sql.group > --- > > Key: SPARK-11915 > URL: https://issues.apache.org/jira/browse/SPARK-11915 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Reporter: Liang-Chi Hsieh > > The python test pyspark.sql.group will fail due to items' order in returned > array. We should sort the aggregation results to make the test stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11915) Fix flaky python test pyspark.sql.group
[ https://issues.apache.org/jira/browse/SPARK-11915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11915: Assignee: (was: Apache Spark) > Fix flaky python test pyspark.sql.group > --- > > Key: SPARK-11915 > URL: https://issues.apache.org/jira/browse/SPARK-11915 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Reporter: Liang-Chi Hsieh > > The python test pyspark.sql.group will fail due to items' order in returned > array. We should sort the aggregation results to make the test stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11861) Feature importances for decision trees
[ https://issues.apache.org/jira/browse/SPARK-11861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021297#comment-15021297 ] Joseph K. Bradley commented on SPARK-11861: --- Exposing the single-tree API for this sounds fine to me. I hid it originally because I did not have the time to research whether people trusted importance values from single trees. Do you know if other libraries provide this? > Feature importances for decision trees > -- > > Key: SPARK-11861 > URL: https://issues.apache.org/jira/browse/SPARK-11861 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Seth Hendrickson >Priority: Minor > > Feature importances should be added to decision trees leveraging the feature > importance implementation for Random Forests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11604) ML 1.6 QA: API: Python API coverage
[ https://issues.apache.org/jira/browse/SPARK-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-11604: Description: For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala & Python versions. We need to track: * Inconsistency: Do class/method/parameter names match? * Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc. * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release. ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well. * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python, to be added in the next release cycle. Please use a *separate* JIRA (linked below) for this list of to-do items. * Inconsistency: ** ml.classification was: For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala & Python versions. We need to track: * Inconsistency: Do class/method/parameter names match? * Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc. * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release. ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well. * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python, to be added in the next release cycle. Please use a *separate* JIRA (linked below) for this list of to-do items. > ML 1.6 QA: API: Python API coverage > --- > > Key: SPARK-11604 > URL: https://issues.apache.org/jira/browse/SPARK-11604 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML, MLlib, PySpark >Reporter: Joseph K. Bradley >Assignee: Yanbo Liang > > For new public APIs added to MLlib, we need to check the generated HTML doc > and compare the Scala & Python versions. We need to track: > * Inconsistency: Do class/method/parameter names match? > * Docs: Is the Python doc missing or just a stub? We want the Python doc to > be as complete as the Scala doc. > * API breaking changes: These should be very rare but are occasionally either > necessary (intentional) or accidental. These must be recorded and added in > the Migration Guide for this release. > ** Note: If the API change is for an Alpha/Experimental/DeveloperApi > component, please note that as well. > * Missing classes/methods/parameters: We should create to-do JIRAs for > functionality missing from Python, to be added in the next release cycle. > Please use a *separate* JIRA (linked below) for this list of to-do items. > * Inconsistency: > ** ml.classification -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11913) support typed aggregate for complex buffer schema
[ https://issues.apache.org/jira/browse/SPARK-11913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021454#comment-15021454 ] Apache Spark commented on SPARK-11913: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/9898 > support typed aggregate for complex buffer schema > - > > Key: SPARK-11913 > URL: https://issues.apache.org/jira/browse/SPARK-11913 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11913) support typed aggregate for complex buffer schema
[ https://issues.apache.org/jira/browse/SPARK-11913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11913: Assignee: Apache Spark > support typed aggregate for complex buffer schema > - > > Key: SPARK-11913 > URL: https://issues.apache.org/jira/browse/SPARK-11913 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Wenchen Fan >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11913) support typed aggregate for complex buffer schema
[ https://issues.apache.org/jira/browse/SPARK-11913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11913: Assignee: (was: Apache Spark) > support typed aggregate for complex buffer schema > - > > Key: SPARK-11913 > URL: https://issues.apache.org/jira/browse/SPARK-11913 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11600) Spark MLlib 1.6 QA umbrella
[ https://issues.apache.org/jira/browse/SPARK-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-11600: -- Description: This JIRA lists tasks for the next MLlib release's QA period. h2. API * Check binary API compatibility (SPARK-11601) * Audit new public APIs (from the generated html doc) ** Scala (SPARK-11602) ** Java compatibility (SPARK-11605) ** Python coverage (SPARK-11604) * Check Experimental, DeveloperApi tags (SPARK-11603) h2. Algorithms and performance *Performance* * _List any other missing performance tests from spark-perf here_ * ALS.recommendAll (SPARK-7457) * perf-tests in Python (SPARK-7539) * perf-tests for transformers (SPARK-2838) * MultilayerPerceptron (SPARK-11911) h2. Documentation and example code * For new algorithms, create JIRAs for updating the user guide (SPARK-11606) * For major components, create JIRAs for example code (SPARK-9670) * Update Programming Guide for 1.6 (towards end of QA) (SPARK-11608) * Update website (SPARK-11607) * Merge duplicate content under examples/ (SPARK-11685) was: This JIRA lists tasks for the next MLlib release's QA period. h2. API * Check binary API compatibility (SPARK-11601) * Audit new public APIs (from the generated html doc) ** Scala (SPARK-11602) ** Java compatibility (SPARK-11605) ** Python coverage (SPARK-11604) * Check Experimental, DeveloperApi tags (SPARK-11603) h2. Algorithms and performance *Performance* * _List any other missing performance tests from spark-perf here_ * ALS.recommendAll (SPARK-7457) * perf-tests in Python (SPARK-7539) * perf-tests for transformers (SPARK-2838) h2. Documentation and example code * For new algorithms, create JIRAs for updating the user guide (SPARK-11606) * For major components, create JIRAs for example code (SPARK-9670) * Update Programming Guide for 1.6 (towards end of QA) (SPARK-11608) * Update website (SPARK-11607) * Merge duplicate content under examples/ (SPARK-11685) > Spark MLlib 1.6 QA umbrella > --- > > Key: SPARK-11600 > URL: https://issues.apache.org/jira/browse/SPARK-11600 > Project: Spark > Issue Type: Umbrella > Components: ML, MLlib >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Critical > > This JIRA lists tasks for the next MLlib release's QA period. > h2. API > * Check binary API compatibility (SPARK-11601) > * Audit new public APIs (from the generated html doc) > ** Scala (SPARK-11602) > ** Java compatibility (SPARK-11605) > ** Python coverage (SPARK-11604) > * Check Experimental, DeveloperApi tags (SPARK-11603) > h2. Algorithms and performance > *Performance* > * _List any other missing performance tests from spark-perf here_ > * ALS.recommendAll (SPARK-7457) > * perf-tests in Python (SPARK-7539) > * perf-tests for transformers (SPARK-2838) > * MultilayerPerceptron (SPARK-11911) > h2. Documentation and example code > * For new algorithms, create JIRAs for updating the user guide (SPARK-11606) > * For major components, create JIRAs for example code (SPARK-9670) > * Update Programming Guide for 1.6 (towards end of QA) (SPARK-11608) > * Update website (SPARK-11607) > * Merge duplicate content under examples/ (SPARK-11685) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11911) spark-perf test for MultilayerPerceptron
Joseph K. Bradley created SPARK-11911: - Summary: spark-perf test for MultilayerPerceptron Key: SPARK-11911 URL: https://issues.apache.org/jira/browse/SPARK-11911 Project: Spark Issue Type: Test Components: ML Reporter: Joseph K. Bradley Priority: Minor Create a test in spark-perf for MultilayerPerceptron -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11913) support typed aggregate for complex buffer schema
Wenchen Fan created SPARK-11913: --- Summary: support typed aggregate for complex buffer schema Key: SPARK-11913 URL: https://issues.apache.org/jira/browse/SPARK-11913 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11915) Fix flaky python test pyspark.sql.group
Liang-Chi Hsieh created SPARK-11915: --- Summary: Fix flaky python test pyspark.sql.group Key: SPARK-11915 URL: https://issues.apache.org/jira/browse/SPARK-11915 Project: Spark Issue Type: Bug Components: PySpark, SQL Reporter: Liang-Chi Hsieh The python test pyspark.sql.group fails due to items' order in returned array. We should fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11912) ml.feature.PCA minor refactor
Yanbo Liang created SPARK-11912: --- Summary: ml.feature.PCA minor refactor Key: SPARK-11912 URL: https://issues.apache.org/jira/browse/SPARK-11912 Project: Spark Issue Type: Improvement Components: ML Reporter: Yanbo Liang Priority: Minor Like SPARK-11852, k is params and we should save it under metadata/ rather than both under data/ and metadata/. We construct mllib.feature.PCAModel inside transform. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11914) [SQL] Support coalesce and repartition in Dataset APIs
[ https://issues.apache.org/jira/browse/SPARK-11914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021478#comment-15021478 ] Apache Spark commented on SPARK-11914: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/9899 > [SQL] Support coalesce and repartition in Dataset APIs > -- > > Key: SPARK-11914 > URL: https://issues.apache.org/jira/browse/SPARK-11914 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Xiao Li > > repartition: Returns a new [[Dataset]] that has exactly `numPartitions` > partitions. > coalesce: Returns a new [[Dataset]] that has exactly `numPartitions` > partitions. Similar to coalesce defined on an [[RDD]], this operation results > in a narrow dependency, e.g. if you go from 1000 partitions to 100 > partitions, there will not be a shuffle, instead each of the 100 new > partitions will claim 10 of the current partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11914) [SQL] Support coalesce and repartition in Dataset APIs
[ https://issues.apache.org/jira/browse/SPARK-11914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11914: Assignee: (was: Apache Spark) > [SQL] Support coalesce and repartition in Dataset APIs > -- > > Key: SPARK-11914 > URL: https://issues.apache.org/jira/browse/SPARK-11914 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Xiao Li > > repartition: Returns a new [[Dataset]] that has exactly `numPartitions` > partitions. > coalesce: Returns a new [[Dataset]] that has exactly `numPartitions` > partitions. Similar to coalesce defined on an [[RDD]], this operation results > in a narrow dependency, e.g. if you go from 1000 partitions to 100 > partitions, there will not be a shuffle, instead each of the 100 new > partitions will claim 10 of the current partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11914) [SQL] Support coalesce and repartition in Dataset APIs
[ https://issues.apache.org/jira/browse/SPARK-11914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11914: Assignee: Apache Spark > [SQL] Support coalesce and repartition in Dataset APIs > -- > > Key: SPARK-11914 > URL: https://issues.apache.org/jira/browse/SPARK-11914 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Xiao Li >Assignee: Apache Spark > > repartition: Returns a new [[Dataset]] that has exactly `numPartitions` > partitions. > coalesce: Returns a new [[Dataset]] that has exactly `numPartitions` > partitions. Similar to coalesce defined on an [[RDD]], this operation results > in a narrow dependency, e.g. if you go from 1000 partitions to 100 > partitions, there will not be a shuffle, instead each of the 100 new > partitions will claim 10 of the current partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11914) [SQL] Support coalesce and repartition in Dataset APIs
Xiao Li created SPARK-11914: --- Summary: [SQL] Support coalesce and repartition in Dataset APIs Key: SPARK-11914 URL: https://issues.apache.org/jira/browse/SPARK-11914 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.6.0 Reporter: Xiao Li repartition: Returns a new [[Dataset]] that has exactly `numPartitions` partitions. coalesce: Returns a new [[Dataset]] that has exactly `numPartitions` partitions. Similar to coalesce defined on an [[RDD]], this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9506) DataFrames Postgresql JDBC unable to support most of the Postgresql's Data Type
[ https://issues.apache.org/jira/browse/SPARK-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021500#comment-15021500 ] Marius Van Niekerk commented on SPARK-9506: --- Quite a few more additional types are supported in 1.6. > DataFrames Postgresql JDBC unable to support most of the Postgresql's Data > Type > --- > > Key: SPARK-9506 > URL: https://issues.apache.org/jira/browse/SPARK-9506 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Reporter: Pangjiu > Attachments: code.PNG, log.PNG, tables_structures.PNG > > > Hi All, > I have issue on using Postgresql JDBC with sqlContext for postgresql's data > types: eg: abstime, character varying[], int2vector, json and etc. > Exception are "Unsupported type 2003" and "Unsupported type ". > Below is the code: > Class.forName("org.postgresql.Driver").newInstance() > val url = "jdbc:postgresql://localhost:5432/sample?user=posgres=xxx" > val driver = "org.postgresql.Driver" > val output = { sqlContext.load("jdbc", Map > ( > "url" -> url, > "driver" -> driver, > "dbtable" -> "(SELECT `ID`, `NAME` FROM > `agent`) AS tableA " > ) > ) > } > Hope SQL Context can support all the data types. > Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11907) Allowing errors as values in DataFrames (like 'Either Left/Right')
[ https://issues.apache.org/jira/browse/SPARK-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tycho Grouwstra updated SPARK-11907: Description: I like Spark, but one thing I find funny about it is that it is picky about circumstantial errors. For one, given the following: {code} import org.apache.spark.sql._ import org.apache.spark.sql.functions._ val sqlContext = new org.apache.spark.sql.SQLContext(sc) val rows = (1,"a") :: (2,"b") :: (3,"c") :: (0,"d") :: Nil val df = sqlContext.createDataFrame(sc.parallelize(rows)).toDF("num","let") val div = udf[Double, Integer](10 / _) df.withColumn("div", div(col("num"))).show() {code} ... the job fails with a `java.lang.ArithmeticException: / by zero`. The example is trivial, but my point is, if one thing goes wrong, the rest goes right, why throw out the baby with the bathwater when you could both show what went wrong as well as went right? Instead, I would propose allowing to use raised Exceptions as resulting values, not unlike how one might store 'bad' results using Either Left/Right constructions in Scala/Haskell (which I suppose would not currently work in DFs, lacking serializability), or cells containing errors in MS Excel. As a solution, I would propose a DataFrame subclass (?) using a variant of NullableColumnBuilder, e.g. ErrorableColumnBuilder (/ SafeColumnBuilder?). NullableColumnBuilder currently explains its workings as follows: {code} /** * A stackable trait used for building byte buffer for a column containing null values. Memory * layout of the final byte buffer is: * {{{ *.--- Null count N (4 bytes) *| .--- Null positions (4 x N bytes, empty if null count is zero) *| | .- Non-null elements *V V V * +---+-+-+ * | | ... | ... ... | * +---+-+-+ * }}} */ {code} This might be extended by adding a further section storing Throwables (or null) for the bad values in question (alt: store count/positions separately from null ones so null values would not need to be stored). Don't get me wrong, there is nothing with throwing exceptions (or catching them for that matter). Rather, I see a use cases for both "do it right or bust" vs. the explorative "show me what happens if I try this operation on these values" -- not unlike how languages as Ruby/Elixir might distinguish unsafe methods using a bang ('!') from their safe variants that should not throw global exceptions. I'm sort of new here but would be glad to get some opinions on this idea. was: I like Spark, but one thing I find funny about it is that it is picky about circumstantial errors. For one, given the following: [code] import org.apache.spark.sql._ import org.apache.spark.sql.functions._ val sqlContext = new org.apache.spark.sql.SQLContext(sc) val rows = (1,"a") :: (2,"b") :: (3,"c") :: (0,"d") :: Nil val df = sqlContext.createDataFrame(sc.parallelize(rows)).toDF("num","let") val div = udf[Double, Integer](10 / _) df.withColumn("div", div(col("num"))).show() [/code] ... the job fails with a `java.lang.ArithmeticException: / by zero`. The example is trivial, but my point is, if one thing goes wrong, the rest goes right, why throw out the baby with the bathwater when you could both show what went wrong as well as went right? Instead, I would propose allowing to use raised Exceptions as resulting values, not unlike how one might store 'bad' results using Either Left/Right constructions in Scala/Haskell (which I suppose would not currently work in DFs, lacking serializability), or cells containing errors in MS Excel. As a solution, I would propose a DataFrame subclass (?) using a variant of NullableColumnBuilder, e.g. ErrorableColumnBuilder (/ SafeColumnBuilder?). NullableColumnBuilder currently explains its workings as follows: [code] /** * A stackable trait used for building byte buffer for a column containing null values. Memory * layout of the final byte buffer is: * {{{ *.--- Null count N (4 bytes) *| .--- Null positions (4 x N bytes, empty if null count is zero) *| | .- Non-null elements *V V V * +---+-+-+ * | | ... | ... ... | * +---+-+-+ * }}} */ [/code] This might be extended by adding a further section storing Throwables (or null) for the bad values in question (alt: store count/positions separately from null ones so null values would not need to be stored). Don't get me wrong, there is nothing with throwing exceptions (or catching them for that matter). Rather, I see a use cases for both "do it right or bust" vs. the explorative "show me what happens if I try this operation on these values" -- not unlike how languages as Ruby/Elixir might distinguish unsafe methods using a bang ('!') from their safe variants that should not throw global
[jira] [Created] (SPARK-11907) Allowing errors as values in DataFrames (like 'Either Left/Right')
Tycho Grouwstra created SPARK-11907: --- Summary: Allowing errors as values in DataFrames (like 'Either Left/Right') Key: SPARK-11907 URL: https://issues.apache.org/jira/browse/SPARK-11907 Project: Spark Issue Type: Wish Components: SQL Reporter: Tycho Grouwstra I like Spark, but one thing I find funny about it is that it is picky about circumstantial errors. For one, given the following: ``` import org.apache.spark.sql._ import org.apache.spark.sql.functions._ val sqlContext = new org.apache.spark.sql.SQLContext(sc) val rows = (1,"a") :: (2,"b") :: (3,"c") :: (0,"d") :: Nil val df = sqlContext.createDataFrame(sc.parallelize(rows)).toDF("num","let") val div = udf[Double, Integer](10 / _) df.withColumn("div", div(col("num"))).show() ``` ... the job fails with a `java.lang.ArithmeticException: / by zero`. The example is trivial, but my point is, if one thing goes wrong, the rest goes right, why throw out the baby with the bathwater when you could both show what went wrong as well as went right? Instead, I would propose allowing to use raised Exceptions as resulting values, not unlike how one might store 'bad' results using Either Left/Right constructions in Scala/Haskell (which I suppose would not currently work in DFs, lacking serializability), or cells containing errors in MS Excel. As a solution, I would propose a DataFrame subclass (?) using a variant of NullableColumnBuilder, e.g. ErrorableColumnBuilder (/ SafeColumnBuilder?). NullableColumnBuilder currently explains its workings as follows: ``` /** * A stackable trait used for building byte buffer for a column containing null values. Memory * layout of the final byte buffer is: * {{{ *.--- Null count N (4 bytes) *| .--- Null positions (4 x N bytes, empty if null count is zero) *| | .- Non-null elements *V V V * +---+-+-+ * | | ... | ... ... | * +---+-+-+ * }}} */ ``` This might be extended by adding a further section storing Throwables (or null) for the bad values in question (alt: store count/positions separately from null ones so null values would not need to be stored). Don't get me wrong, there is nothing with throwing exceptions (or catching them for that matter). Rather, I see a use cases for both "do it right or bust" vs. the explorative "show me what happens if I try this operation on these values" -- not unlike how languages as Ruby/Elixir might distinguish unsafe methods using a bang ('!') from their safe variants that should not throw global exceptions. I'm sort of new here but would be glad to get some opinions on this idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11907) Allowing errors as values in DataFrames (like 'Either Left/Right')
[ https://issues.apache.org/jira/browse/SPARK-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tycho Grouwstra updated SPARK-11907: Description: I like Spark, but one thing I find funny about it is that it is picky about circumstantial errors. For one, given the following: [code] import org.apache.spark.sql._ import org.apache.spark.sql.functions._ val sqlContext = new org.apache.spark.sql.SQLContext(sc) val rows = (1,"a") :: (2,"b") :: (3,"c") :: (0,"d") :: Nil val df = sqlContext.createDataFrame(sc.parallelize(rows)).toDF("num","let") val div = udf[Double, Integer](10 / _) df.withColumn("div", div(col("num"))).show() [/code] ... the job fails with a `java.lang.ArithmeticException: / by zero`. The example is trivial, but my point is, if one thing goes wrong, the rest goes right, why throw out the baby with the bathwater when you could both show what went wrong as well as went right? Instead, I would propose allowing to use raised Exceptions as resulting values, not unlike how one might store 'bad' results using Either Left/Right constructions in Scala/Haskell (which I suppose would not currently work in DFs, lacking serializability), or cells containing errors in MS Excel. As a solution, I would propose a DataFrame subclass (?) using a variant of NullableColumnBuilder, e.g. ErrorableColumnBuilder (/ SafeColumnBuilder?). NullableColumnBuilder currently explains its workings as follows: [code] /** * A stackable trait used for building byte buffer for a column containing null values. Memory * layout of the final byte buffer is: * {{{ *.--- Null count N (4 bytes) *| .--- Null positions (4 x N bytes, empty if null count is zero) *| | .- Non-null elements *V V V * +---+-+-+ * | | ... | ... ... | * +---+-+-+ * }}} */ [/code] This might be extended by adding a further section storing Throwables (or null) for the bad values in question (alt: store count/positions separately from null ones so null values would not need to be stored). Don't get me wrong, there is nothing with throwing exceptions (or catching them for that matter). Rather, I see a use cases for both "do it right or bust" vs. the explorative "show me what happens if I try this operation on these values" -- not unlike how languages as Ruby/Elixir might distinguish unsafe methods using a bang ('!') from their safe variants that should not throw global exceptions. I'm sort of new here but would be glad to get some opinions on this idea. was: I like Spark, but one thing I find funny about it is that it is picky about circumstantial errors. For one, given the following: ``` import org.apache.spark.sql._ import org.apache.spark.sql.functions._ val sqlContext = new org.apache.spark.sql.SQLContext(sc) val rows = (1,"a") :: (2,"b") :: (3,"c") :: (0,"d") :: Nil val df = sqlContext.createDataFrame(sc.parallelize(rows)).toDF("num","let") val div = udf[Double, Integer](10 / _) df.withColumn("div", div(col("num"))).show() ``` ... the job fails with a `java.lang.ArithmeticException: / by zero`. The example is trivial, but my point is, if one thing goes wrong, the rest goes right, why throw out the baby with the bathwater when you could both show what went wrong as well as went right? Instead, I would propose allowing to use raised Exceptions as resulting values, not unlike how one might store 'bad' results using Either Left/Right constructions in Scala/Haskell (which I suppose would not currently work in DFs, lacking serializability), or cells containing errors in MS Excel. As a solution, I would propose a DataFrame subclass (?) using a variant of NullableColumnBuilder, e.g. ErrorableColumnBuilder (/ SafeColumnBuilder?). NullableColumnBuilder currently explains its workings as follows: ``` /** * A stackable trait used for building byte buffer for a column containing null values. Memory * layout of the final byte buffer is: * {{{ *.--- Null count N (4 bytes) *| .--- Null positions (4 x N bytes, empty if null count is zero) *| | .- Non-null elements *V V V * +---+-+-+ * | | ... | ... ... | * +---+-+-+ * }}} */ ``` This might be extended by adding a further section storing Throwables (or null) for the bad values in question (alt: store count/positions separately from null ones so null values would not need to be stored). Don't get me wrong, there is nothing with throwing exceptions (or catching them for that matter). Rather, I see a use cases for both "do it right or bust" vs. the explorative "show me what happens if I try this operation on these values" -- not unlike how languages as Ruby/Elixir might distinguish unsafe methods using a bang ('!') from their safe variants that should not throw global exceptions. I'm sort
[jira] [Updated] (SPARK-11716) UDFRegistration Drops Input Type Information
[ https://issues.apache.org/jira/browse/SPARK-11716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-11716: -- Assignee: Yin Huai > UDFRegistration Drops Input Type Information > > > Key: SPARK-11716 > URL: https://issues.apache.org/jira/browse/SPARK-11716 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 >Reporter: Artjom Metro >Assignee: Yin Huai > Labels: sql, udf > Fix For: 1.6.0 > > > The UserDefinedFunction returned by the UDFRegistration does not contain the > input type information, although that information is available. > To fix the issue the last line of every register function would had to be > changed to "UserDefinedFunction(func, dataType, inputType)" or is there any > specific reason this was not done? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11910) Streaming programming guide references wrong dependency version
[ https://issues.apache.org/jira/browse/SPARK-11910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021059#comment-15021059 ] Apache Spark commented on SPARK-11910: -- User 'lresende' has created a pull request for this issue: https://github.com/apache/spark/pull/9892 > Streaming programming guide references wrong dependency version > --- > > Key: SPARK-11910 > URL: https://issues.apache.org/jira/browse/SPARK-11910 > Project: Spark > Issue Type: Bug > Components: Documentation, Streaming >Affects Versions: 1.6.0 >Reporter: Luciano Resende >Priority: Minor > > SPARK-11245 have upgraded twitter dependency to 4.0.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11910) Streaming programming guide references wrong dependency version
[ https://issues.apache.org/jira/browse/SPARK-11910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11910: Assignee: Apache Spark > Streaming programming guide references wrong dependency version > --- > > Key: SPARK-11910 > URL: https://issues.apache.org/jira/browse/SPARK-11910 > Project: Spark > Issue Type: Bug > Components: Documentation, Streaming >Affects Versions: 1.6.0 >Reporter: Luciano Resende >Assignee: Apache Spark >Priority: Minor > > SPARK-11245 have upgraded twitter dependency to 4.0.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10521) Utilize Docker to test DB2 JDBC Dialect support
[ https://issues.apache.org/jira/browse/SPARK-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10521: Assignee: (was: Apache Spark) > Utilize Docker to test DB2 JDBC Dialect support > --- > > Key: SPARK-10521 > URL: https://issues.apache.org/jira/browse/SPARK-10521 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.4.1, 1.5.0 >Reporter: Luciano Resende > > There was a discussion in SPARK-10170 around using a docker image to execute > the DB2 JDBC dialect tests. I will use this jira to work on providing the > basic image together with the test integration. We can then extend the > testing coverage as needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10521) Utilize Docker to test DB2 JDBC Dialect support
[ https://issues.apache.org/jira/browse/SPARK-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021061#comment-15021061 ] Apache Spark commented on SPARK-10521: -- User 'lresende' has created a pull request for this issue: https://github.com/apache/spark/pull/9893 > Utilize Docker to test DB2 JDBC Dialect support > --- > > Key: SPARK-10521 > URL: https://issues.apache.org/jira/browse/SPARK-10521 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.4.1, 1.5.0 >Reporter: Luciano Resende > > There was a discussion in SPARK-10170 around using a docker image to execute > the DB2 JDBC dialect tests. I will use this jira to work on providing the > basic image together with the test integration. We can then extend the > testing coverage as needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10521) Utilize Docker to test DB2 JDBC Dialect support
[ https://issues.apache.org/jira/browse/SPARK-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10521: Assignee: Apache Spark > Utilize Docker to test DB2 JDBC Dialect support > --- > > Key: SPARK-10521 > URL: https://issues.apache.org/jira/browse/SPARK-10521 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.4.1, 1.5.0 >Reporter: Luciano Resende >Assignee: Apache Spark > > There was a discussion in SPARK-10170 around using a docker image to execute > the DB2 JDBC dialect tests. I will use this jira to work on providing the > basic image together with the test integration. We can then extend the > testing coverage as needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11910) Streaming programming guide references wrong dependency version
Luciano Resende created SPARK-11910: --- Summary: Streaming programming guide references wrong dependency version Key: SPARK-11910 URL: https://issues.apache.org/jira/browse/SPARK-11910 Project: Spark Issue Type: Bug Components: Documentation, Streaming Affects Versions: 1.6.0 Reporter: Luciano Resende Priority: Minor SPARK-11245 have upgraded twitter dependency to 4.0.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11910) Streaming programming guide references wrong dependency version
[ https://issues.apache.org/jira/browse/SPARK-11910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11910: Assignee: (was: Apache Spark) > Streaming programming guide references wrong dependency version > --- > > Key: SPARK-11910 > URL: https://issues.apache.org/jira/browse/SPARK-11910 > Project: Spark > Issue Type: Bug > Components: Documentation, Streaming >Affects Versions: 1.6.0 >Reporter: Luciano Resende >Priority: Minor > > SPARK-11245 have upgraded twitter dependency to 4.0.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11908) Add NullType support to RowEncoder
[ https://issues.apache.org/jira/browse/SPARK-11908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11908: Assignee: Apache Spark > Add NullType support to RowEncoder > -- > > Key: SPARK-11908 > URL: https://issues.apache.org/jira/browse/SPARK-11908 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > We should add NullType support to RowEncoder. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11909) Spark Standalone's master URL accepts URLs without port (assuming default 7077)
Jacek Laskowski created SPARK-11909: --- Summary: Spark Standalone's master URL accepts URLs without port (assuming default 7077) Key: SPARK-11909 URL: https://issues.apache.org/jira/browse/SPARK-11909 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.6.0 Reporter: Jacek Laskowski Priority: Trivial It's currently impossible to use {{spark://localhost}} URL for Spark Standalone's master. With the feature supported, it'd be less to know to get started with the mode (and hence improve user friendliness). I think no-port master URL should be supported and assume the default port {{7077}}. {code} org.apache.spark.SparkException: Invalid master URL: spark://localhost at org.apache.spark.util.Utils$.extractHostPortFromSparkUrl(Utils.scala:2088) at org.apache.spark.rpc.RpcAddress$.fromSparkURL(RpcAddress.scala:47) at org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) at org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.deploy.client.AppClient.(AppClient.scala:48) at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.start(SparkDeploySchedulerBackend.scala:93) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11909) Spark Standalone's master URL accepts URLs without port (assuming default 7077)
[ https://issues.apache.org/jira/browse/SPARK-11909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020976#comment-15020976 ] Sean Owen commented on SPARK-11909: --- I disagree. The default is not a well-known port like 80 for HTTP. It makes sense to avoid confusion by explicitly stating the port, as with launching the master. > Spark Standalone's master URL accepts URLs without port (assuming default > 7077) > --- > > Key: SPARK-11909 > URL: https://issues.apache.org/jira/browse/SPARK-11909 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Jacek Laskowski >Priority: Trivial > > It's currently impossible to use {{spark://localhost}} URL for Spark > Standalone's master. With the feature supported, it'd be less to know to get > started with the mode (and hence improve user friendliness). > I think no-port master URL should be supported and assume the default port > {{7077}}. > {code} > org.apache.spark.SparkException: Invalid master URL: spark://localhost > at > org.apache.spark.util.Utils$.extractHostPortFromSparkUrl(Utils.scala:2088) > at org.apache.spark.rpc.RpcAddress$.fromSparkURL(RpcAddress.scala:47) > at > org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) > at > org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at org.apache.spark.deploy.client.AppClient.(AppClient.scala:48) > at > org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.start(SparkDeploySchedulerBackend.scala:93) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) > at org.apache.spark.SparkContext.(SparkContext.scala:530) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11065) IOException thrown at job submit shutdown
[ https://issues.apache.org/jira/browse/SPARK-11065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020982#comment-15020982 ] Jean-Baptiste Onofré commented on SPARK-11065: -- It's not really a problem, but IMHO, it's a bit annoying and can disturb users (as they may think about a real problem). Let me dig a bit to find the cause and submit a PR. NB: it happens only with 1.6.0-SNAPSHOT, 1.5.x is fine. > IOException thrown at job submit shutdown > - > > Key: SPARK-11065 > URL: https://issues.apache.org/jira/browse/SPARK-11065 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.6.0 >Reporter: Jean-Baptiste Onofré >Priority: Minor > > When submitted a job (for instance JavaWordCount example), even if the job > works fine, at the end of execution, we can see: > {code} > checkForCorruptJournalFiles="true": 1 > 15/10/12 16:31:12 INFO SparkUI: Stopped Spark web UI at > http://192.168.134.10:4040 > 15/10/12 16:31:12 INFO DAGScheduler: Stopping DAGScheduler > 15/10/12 16:31:12 INFO SparkDeploySchedulerBackend: Shutting down all > executors > 15/10/12 16:31:12 INFO SparkDeploySchedulerBackend: Asking each executor to > shut down > 15/10/12 16:31:12 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 15/10/12 16:31:12 INFO MemoryStore: MemoryStore cleared > 15/10/12 16:31:12 INFO BlockManager: BlockManager stopped > 15/10/12 16:31:12 INFO BlockManagerMaster: BlockManagerMaster stopped > 15/10/12 16:31:12 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 15/10/12 16:31:12 ERROR TransportResponseHandler: Still have 1 requests > outstanding when connection from localhost/127.0.0.1:7077 is closed > 15/10/12 16:31:12 ERROR NettyRpcEnv: Exception when sending > RequestMessage(192.168.134.10:40548,NettyRpcEndpointRef(spark://Master@localhost:7077),UnregisterApplication(app-20151012163109-),false) > java.io.IOException: Connection from localhost/127.0.0.1:7077 closed > at > org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:104) > at > org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:91) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) > at > io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:739) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:659) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > 15/10/12 16:31:12 INFO RemoteActorRefProvider$RemotingTerminator: Shutting > down remote daemon. > 15/10/12 16:31:12 INFO RemoteActorRefProvider$RemotingTerminator: Remote > daemon shut down; proceeding with flushing remote transports. > 15/10/12 16:31:12 INFO SparkContext: Successfully stopped SparkContext > 15/10/12 16:31:12 INFO ShutdownHookManager: Shutdown hook called > 15/10/12 16:31:12 INFO ShutdownHookManager: Deleting directory > /tmp/spark-81bc4324-1268-4e54-bdd2-f7a2a36dafd4 > {code} > I gonna investigate about that and I will submit a PR. -- This message was
[jira] [Commented] (SPARK-11908) Add NullType support to RowEncoder
[ https://issues.apache.org/jira/browse/SPARK-11908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020935#comment-15020935 ] Apache Spark commented on SPARK-11908: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/9891 > Add NullType support to RowEncoder > -- > > Key: SPARK-11908 > URL: https://issues.apache.org/jira/browse/SPARK-11908 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Liang-Chi Hsieh > > We should add NullType support to RowEncoder. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11908) Add NullType support to RowEncoder
Liang-Chi Hsieh created SPARK-11908: --- Summary: Add NullType support to RowEncoder Key: SPARK-11908 URL: https://issues.apache.org/jira/browse/SPARK-11908 Project: Spark Issue Type: Improvement Components: SQL Reporter: Liang-Chi Hsieh We should add NullType support to RowEncoder. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11908) Add NullType support to RowEncoder
[ https://issues.apache.org/jira/browse/SPARK-11908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11908: Assignee: (was: Apache Spark) > Add NullType support to RowEncoder > -- > > Key: SPARK-11908 > URL: https://issues.apache.org/jira/browse/SPARK-11908 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Liang-Chi Hsieh > > We should add NullType support to RowEncoder. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11065) IOException thrown at job submit shutdown
[ https://issues.apache.org/jira/browse/SPARK-11065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020981#comment-15020981 ] Maciej Bryński commented on SPARK-11065: [~srowen] OK. But this issue is new in 1.6.0. > IOException thrown at job submit shutdown > - > > Key: SPARK-11065 > URL: https://issues.apache.org/jira/browse/SPARK-11065 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.6.0 >Reporter: Jean-Baptiste Onofré >Priority: Minor > > When submitted a job (for instance JavaWordCount example), even if the job > works fine, at the end of execution, we can see: > {code} > checkForCorruptJournalFiles="true": 1 > 15/10/12 16:31:12 INFO SparkUI: Stopped Spark web UI at > http://192.168.134.10:4040 > 15/10/12 16:31:12 INFO DAGScheduler: Stopping DAGScheduler > 15/10/12 16:31:12 INFO SparkDeploySchedulerBackend: Shutting down all > executors > 15/10/12 16:31:12 INFO SparkDeploySchedulerBackend: Asking each executor to > shut down > 15/10/12 16:31:12 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 15/10/12 16:31:12 INFO MemoryStore: MemoryStore cleared > 15/10/12 16:31:12 INFO BlockManager: BlockManager stopped > 15/10/12 16:31:12 INFO BlockManagerMaster: BlockManagerMaster stopped > 15/10/12 16:31:12 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 15/10/12 16:31:12 ERROR TransportResponseHandler: Still have 1 requests > outstanding when connection from localhost/127.0.0.1:7077 is closed > 15/10/12 16:31:12 ERROR NettyRpcEnv: Exception when sending > RequestMessage(192.168.134.10:40548,NettyRpcEndpointRef(spark://Master@localhost:7077),UnregisterApplication(app-20151012163109-),false) > java.io.IOException: Connection from localhost/127.0.0.1:7077 closed > at > org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:104) > at > org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:91) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) > at > io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:739) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:659) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > 15/10/12 16:31:12 INFO RemoteActorRefProvider$RemotingTerminator: Shutting > down remote daemon. > 15/10/12 16:31:12 INFO RemoteActorRefProvider$RemotingTerminator: Remote > daemon shut down; proceeding with flushing remote transports. > 15/10/12 16:31:12 INFO SparkContext: Successfully stopped SparkContext > 15/10/12 16:31:12 INFO ShutdownHookManager: Shutdown hook called > 15/10/12 16:31:12 INFO ShutdownHookManager: Deleting directory > /tmp/spark-81bc4324-1268-4e54-bdd2-f7a2a36dafd4 > {code} > I gonna investigate about that and I will submit a PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands,
[jira] [Commented] (SPARK-11826) Subtract BlockMatrix
[ https://issues.apache.org/jira/browse/SPARK-11826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020979#comment-15020979 ] Sean Owen commented on SPARK-11826: --- OK, given the existence of add(), this probably makes some sense for completeness. It's minor, so best to keep the implementation light. Can you implement add() and subtract() in terms of one common function that takes an associative operation on matrices in Breeze? > Subtract BlockMatrix > > > Key: SPARK-11826 > URL: https://issues.apache.org/jira/browse/SPARK-11826 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.0 >Reporter: Ehsan Mohyedin Kermani >Priority: Minor > > It'd be more convenient to have subtract method for BlockMatrices. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11065) IOException thrown at job submit shutdown
[ https://issues.apache.org/jira/browse/SPARK-11065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020939#comment-15020939 ] Maciej Bryński commented on SPARK-11065: I have the same error. Job run successfully, but this output is misleading. {code} 15/11/22 11:51:47 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from XXX:7077 is closed 15/11/22 11:51:48 WARN NettyRpcEnv: Exception when sending RequestMessage(178.33.61.44:39524,NettyRpcEndpointRef(spark://Master@XXX:7077),UnregisterApplication(app-20151122110204-),false) java.io.IOException: Connection from XXX:7077 closed at org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:116) at org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:94) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:739) at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:659) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) {code} > IOException thrown at job submit shutdown > - > > Key: SPARK-11065 > URL: https://issues.apache.org/jira/browse/SPARK-11065 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.6.0 >Reporter: Jean-Baptiste Onofré >Priority: Minor > > When submitted a job (for instance JavaWordCount example), even if the job > works fine, at the end of execution, we can see: > {code} > checkForCorruptJournalFiles="true": 1 > 15/10/12 16:31:12 INFO SparkUI: Stopped Spark web UI at > http://192.168.134.10:4040 > 15/10/12 16:31:12 INFO DAGScheduler: Stopping DAGScheduler > 15/10/12 16:31:12 INFO SparkDeploySchedulerBackend: Shutting down all > executors > 15/10/12 16:31:12 INFO SparkDeploySchedulerBackend: Asking each executor to > shut down > 15/10/12 16:31:12 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 15/10/12 16:31:12 INFO MemoryStore: MemoryStore cleared > 15/10/12 16:31:12 INFO BlockManager: BlockManager stopped > 15/10/12 16:31:12 INFO BlockManagerMaster: BlockManagerMaster stopped > 15/10/12 16:31:12 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 15/10/12 16:31:12 ERROR TransportResponseHandler: Still have 1 requests > outstanding when connection from localhost/127.0.0.1:7077 is closed > 15/10/12 16:31:12 ERROR NettyRpcEnv: Exception when sending > RequestMessage(192.168.134.10:40548,NettyRpcEndpointRef(spark://Master@localhost:7077),UnregisterApplication(app-20151012163109-),false) > java.io.IOException: Connection from localhost/127.0.0.1:7077 closed > at > org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:104) > at > org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:91) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) > at >
[jira] [Commented] (SPARK-11065) IOException thrown at job submit shutdown
[ https://issues.apache.org/jira/browse/SPARK-11065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020977#comment-15020977 ] Sean Owen commented on SPARK-11065: --- Unless it's causing a problem, I'd ignore it. Shutdown is inherently somewhat asynchronous and some components may complain if they lose a connection to another. In that case, the error maybe should be a warning. > IOException thrown at job submit shutdown > - > > Key: SPARK-11065 > URL: https://issues.apache.org/jira/browse/SPARK-11065 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.6.0 >Reporter: Jean-Baptiste Onofré >Priority: Minor > > When submitted a job (for instance JavaWordCount example), even if the job > works fine, at the end of execution, we can see: > {code} > checkForCorruptJournalFiles="true": 1 > 15/10/12 16:31:12 INFO SparkUI: Stopped Spark web UI at > http://192.168.134.10:4040 > 15/10/12 16:31:12 INFO DAGScheduler: Stopping DAGScheduler > 15/10/12 16:31:12 INFO SparkDeploySchedulerBackend: Shutting down all > executors > 15/10/12 16:31:12 INFO SparkDeploySchedulerBackend: Asking each executor to > shut down > 15/10/12 16:31:12 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 15/10/12 16:31:12 INFO MemoryStore: MemoryStore cleared > 15/10/12 16:31:12 INFO BlockManager: BlockManager stopped > 15/10/12 16:31:12 INFO BlockManagerMaster: BlockManagerMaster stopped > 15/10/12 16:31:12 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 15/10/12 16:31:12 ERROR TransportResponseHandler: Still have 1 requests > outstanding when connection from localhost/127.0.0.1:7077 is closed > 15/10/12 16:31:12 ERROR NettyRpcEnv: Exception when sending > RequestMessage(192.168.134.10:40548,NettyRpcEndpointRef(spark://Master@localhost:7077),UnregisterApplication(app-20151012163109-),false) > java.io.IOException: Connection from localhost/127.0.0.1:7077 closed > at > org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:104) > at > org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:91) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) > at > io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:739) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:659) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > 15/10/12 16:31:12 INFO RemoteActorRefProvider$RemotingTerminator: Shutting > down remote daemon. > 15/10/12 16:31:12 INFO RemoteActorRefProvider$RemotingTerminator: Remote > daemon shut down; proceeding with flushing remote transports. > 15/10/12 16:31:12 INFO SparkContext: Successfully stopped SparkContext > 15/10/12 16:31:12 INFO ShutdownHookManager: Shutdown hook called > 15/10/12 16:31:12 INFO ShutdownHookManager: Deleting directory > /tmp/spark-81bc4324-1268-4e54-bdd2-f7a2a36dafd4 > {code} > I gonna investigate about that and I will submit a PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SPARK-11906) Speculation Tasks Cause ProgressBar UI Overflow
[ https://issues.apache.org/jira/browse/SPARK-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020990#comment-15020990 ] Sean Owen commented on SPARK-11906: --- Yes, can you open a PR? sounds like you already identified the problem. > Speculation Tasks Cause ProgressBar UI Overflow > --- > > Key: SPARK-11906 > URL: https://issues.apache.org/jira/browse/SPARK-11906 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Sen Fang >Priority: Trivial > > When there are speculative tasks in stage, the started tasks + completed > tasks can be greater than total number of tasks. It leads to the started > progress block to overflow to next line. Visually the light blue progress > block becomes no longer visible when it happens. > The fix should be as trivial as to cap the number of started task by total - > completed task. > https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L322 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11909) Spark Standalone's master URL accepts URLs without port (assuming default 7077)
[ https://issues.apache.org/jira/browse/SPARK-11909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021153#comment-15021153 ] Jacek Laskowski commented on SPARK-11909: - _"The default is not a well-known port like 80 for HTTP"_ - that's exactly the reason why I filed the issue. Since it's not well-known it's hard to remember it and hence not very easy for people new to Spark. I experienced the mental "pain" today when I started Spark Standalone and had to remember the number to create SparkContext properly. Less to remember => less confusion => more happy users. > Spark Standalone's master URL accepts URLs without port (assuming default > 7077) > --- > > Key: SPARK-11909 > URL: https://issues.apache.org/jira/browse/SPARK-11909 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Jacek Laskowski >Priority: Trivial > > It's currently impossible to use {{spark://localhost}} URL for Spark > Standalone's master. With the feature supported, it'd be less to know to get > started with the mode (and hence improve user friendliness). > I think no-port master URL should be supported and assume the default port > {{7077}}. > {code} > org.apache.spark.SparkException: Invalid master URL: spark://localhost > at > org.apache.spark.util.Utils$.extractHostPortFromSparkUrl(Utils.scala:2088) > at org.apache.spark.rpc.RpcAddress$.fromSparkURL(RpcAddress.scala:47) > at > org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) > at > org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at org.apache.spark.deploy.client.AppClient.(AppClient.scala:48) > at > org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.start(SparkDeploySchedulerBackend.scala:93) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) > at org.apache.spark.SparkContext.(SparkContext.scala:530) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11909) Spark Standalone's master URL accepts URLs without port (assuming default 7077)
[ https://issues.apache.org/jira/browse/SPARK-11909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021167#comment-15021167 ] Sean Owen commented on SPARK-11909: --- I think that cuts the other way. You're helping people not think about what port the master they're talking to is running on, which is probably more confusing than explicitly stating the port, especially if you accidentally talk to the wrong one somehow. > Spark Standalone's master URL accepts URLs without port (assuming default > 7077) > --- > > Key: SPARK-11909 > URL: https://issues.apache.org/jira/browse/SPARK-11909 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Jacek Laskowski >Priority: Trivial > > It's currently impossible to use {{spark://localhost}} URL for Spark > Standalone's master. With the feature supported, it'd be less to know to get > started with the mode (and hence improve user friendliness). > I think no-port master URL should be supported and assume the default port > {{7077}}. > {code} > org.apache.spark.SparkException: Invalid master URL: spark://localhost > at > org.apache.spark.util.Utils$.extractHostPortFromSparkUrl(Utils.scala:2088) > at org.apache.spark.rpc.RpcAddress$.fromSparkURL(RpcAddress.scala:47) > at > org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) > at > org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at org.apache.spark.deploy.client.AppClient.(AppClient.scala:48) > at > org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.start(SparkDeploySchedulerBackend.scala:93) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) > at org.apache.spark.SparkContext.(SparkContext.scala:530) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11783) When deployed against remote Hive metastore, HiveContext.executionHive points to wrong metastore
[ https://issues.apache.org/jira/browse/SPARK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021082#comment-15021082 ] Apache Spark commented on SPARK-11783: -- User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/9895 > When deployed against remote Hive metastore, HiveContext.executionHive points > to wrong metastore > > > Key: SPARK-11783 > URL: https://issues.apache.org/jira/browse/SPARK-11783 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1, 1.6.0, 1.7.0 >Reporter: Cheng Lian >Assignee: Cheng Lian >Priority: Critical > > When using remote metastore, execution Hive client somehow is initialized to > point to the actual remote metastore instead of the dummy local Derby > metastore. > To reproduce this issue: > # Configuring {{conf/hive-site.xml}} to point to a remote Hive 1.2.1 > metastore. > # Set {{hive.metastore.uris}} to {{thrift://localhost:9083}}. > # Start metastore service using {{$HIVE_HOME/bin/hive --service metastore}} > # Start Thrift server with remote debugging options > # Attach the debugger to the Thrift server driver process, we can verify that > {{executionHive}} points to the remote metastore rather than the local > execution Derby metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11847) Model export/import for spark.ml: LDA
[ https://issues.apache.org/jira/browse/SPARK-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11847: Assignee: yuhao yang (was: Apache Spark) > Model export/import for spark.ml: LDA > - > > Key: SPARK-11847 > URL: https://issues.apache.org/jira/browse/SPARK-11847 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xiangrui Meng >Assignee: yuhao yang > > Add read/write support to LDA, similar to ALS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11783) When deployed against remote Hive metastore, HiveContext.executionHive points to wrong metastore
[ https://issues.apache.org/jira/browse/SPARK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11783: Assignee: Cheng Lian (was: Apache Spark) > When deployed against remote Hive metastore, HiveContext.executionHive points > to wrong metastore > > > Key: SPARK-11783 > URL: https://issues.apache.org/jira/browse/SPARK-11783 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1, 1.6.0, 1.7.0 >Reporter: Cheng Lian >Assignee: Cheng Lian >Priority: Critical > > When using remote metastore, execution Hive client somehow is initialized to > point to the actual remote metastore instead of the dummy local Derby > metastore. > To reproduce this issue: > # Configuring {{conf/hive-site.xml}} to point to a remote Hive 1.2.1 > metastore. > # Set {{hive.metastore.uris}} to {{thrift://localhost:9083}}. > # Start metastore service using {{$HIVE_HOME/bin/hive --service metastore}} > # Start Thrift server with remote debugging options > # Attach the debugger to the Thrift server driver process, we can verify that > {{executionHive}} points to the remote metastore rather than the local > execution Derby metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11847) Model export/import for spark.ml: LDA
[ https://issues.apache.org/jira/browse/SPARK-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021083#comment-15021083 ] Apache Spark commented on SPARK-11847: -- User 'hhbyyh' has created a pull request for this issue: https://github.com/apache/spark/pull/9894 > Model export/import for spark.ml: LDA > - > > Key: SPARK-11847 > URL: https://issues.apache.org/jira/browse/SPARK-11847 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xiangrui Meng >Assignee: yuhao yang > > Add read/write support to LDA, similar to ALS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11783) When deployed against remote Hive metastore, HiveContext.executionHive points to wrong metastore
[ https://issues.apache.org/jira/browse/SPARK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11783: Assignee: Apache Spark (was: Cheng Lian) > When deployed against remote Hive metastore, HiveContext.executionHive points > to wrong metastore > > > Key: SPARK-11783 > URL: https://issues.apache.org/jira/browse/SPARK-11783 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1, 1.6.0, 1.7.0 >Reporter: Cheng Lian >Assignee: Apache Spark >Priority: Critical > > When using remote metastore, execution Hive client somehow is initialized to > point to the actual remote metastore instead of the dummy local Derby > metastore. > To reproduce this issue: > # Configuring {{conf/hive-site.xml}} to point to a remote Hive 1.2.1 > metastore. > # Set {{hive.metastore.uris}} to {{thrift://localhost:9083}}. > # Start metastore service using {{$HIVE_HOME/bin/hive --service metastore}} > # Start Thrift server with remote debugging options > # Attach the debugger to the Thrift server driver process, we can verify that > {{executionHive}} points to the remote metastore rather than the local > execution Derby metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11847) Model export/import for spark.ml: LDA
[ https://issues.apache.org/jira/browse/SPARK-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11847: Assignee: Apache Spark (was: yuhao yang) > Model export/import for spark.ml: LDA > - > > Key: SPARK-11847 > URL: https://issues.apache.org/jira/browse/SPARK-11847 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xiangrui Meng >Assignee: Apache Spark > > Add read/write support to LDA, similar to ALS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4514) SparkContext localProperties does not inherit property updates across thread reuse
[ https://issues.apache.org/jira/browse/SPARK-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021185#comment-15021185 ] Richard W. Eggert II commented on SPARK-4514: - The unit test attached to this issue fails in master, but passes in https://github.com/apache/spark/pull/9264 > SparkContext localProperties does not inherit property updates across thread > reuse > -- > > Key: SPARK-4514 > URL: https://issues.apache.org/jira/browse/SPARK-4514 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Erik Erlandson >Assignee: Josh Rosen >Priority: Critical > > The current job group id of a Spark context is stored in the > {{localProperties}} member value. This data structure is designed to be > thread local, and its settings are not preserved when {{ComplexFutureAction}} > instantiates a new {{Future}}. > One consequence of this is that {{takeAsync()}} does not behave in the same > way as other async actions, e.g. {{countAsync()}}. For example, this test > (if copied into StatusTrackerSuite.scala), will fail, because > {{"my-job-group2"}} is not propagated to the Future which actually > instantiates the job: > {code:java} > test("getJobIdsForGroup() with takeAsync()") { > sc = new SparkContext("local", "test", new SparkConf(false)) > sc.setJobGroup("my-job-group2", "description") > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be (Seq.empty) > val firstJobFuture = sc.parallelize(1 to 1000, 1).takeAsync(1) > val firstJobId = eventually(timeout(10 seconds)) { > firstJobFuture.jobIds.head > } > eventually(timeout(10 seconds)) { > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be > (Seq(firstJobId)) > } > } > {code} > It also impacts current PR for SPARK-1021, which involves additional uses of > {{ComplexFutureAction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11909) Spark Standalone's master URL accepts URLs without port (assuming default 7077)
[ https://issues.apache.org/jira/browse/SPARK-11909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021186#comment-15021186 ] Jacek Laskowski commented on SPARK-11909: - What about a WARN message about the port in use to connect to a Spark Standalone master for users who need less to remember and type like me? It'd be a nice time saver. That would at the _very_ least spare the "recommendation" at http://spark.apache.org/docs/latest/spark-standalone.html#starting-a-cluster-manually which is actually false (as the master doesn't print out the URL to the console once started): _Once started, the master will print out a spark://HOST:PORT URL for itself, which you can use to connect workers to it, or pass as the “master” argument to SparkContext. You can also find this URL on the master’s web UI, which is http://localhost:8080 by default._ > Spark Standalone's master URL accepts URLs without port (assuming default > 7077) > --- > > Key: SPARK-11909 > URL: https://issues.apache.org/jira/browse/SPARK-11909 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Jacek Laskowski >Priority: Trivial > > It's currently impossible to use {{spark://localhost}} URL for Spark > Standalone's master. With the feature supported, it'd be less to know to get > started with the mode (and hence improve user friendliness). > I think no-port master URL should be supported and assume the default port > {{7077}}. > {code} > org.apache.spark.SparkException: Invalid master URL: spark://localhost > at > org.apache.spark.util.Utils$.extractHostPortFromSparkUrl(Utils.scala:2088) > at org.apache.spark.rpc.RpcAddress$.fromSparkURL(RpcAddress.scala:47) > at > org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) > at > org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at org.apache.spark.deploy.client.AppClient.(AppClient.scala:48) > at > org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.start(SparkDeploySchedulerBackend.scala:93) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) > at org.apache.spark.SparkContext.(SparkContext.scala:530) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-11909) Spark Standalone's master URL accepts URLs without port (assuming default 7077)
[ https://issues.apache.org/jira/browse/SPARK-11909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021186#comment-15021186 ] Jacek Laskowski edited comment on SPARK-11909 at 11/22/15 8:51 PM: --- What about a WARN message about the port in use to connect to a Spark Standalone master for users who need less to remember and type like me? It'd be a nice time saver. That would at the _very_ least spare the "recommendation" at http://spark.apache.org/docs/latest/spark-standalone.html#starting-a-cluster-manually which is actually false (as the master doesn't print out the URL to the console once started): {quote} Once started, the master will print out a spark://HOST:PORT URL for itself, which you can use to connect workers to it, or pass as the “master” argument to SparkContext. You can also find this URL on the master’s web UI, which is http://localhost:8080 by default. {quote} was (Author: jlaskowski): What about a WARN message about the port in use to connect to a Spark Standalone master for users who need less to remember and type like me? It'd be a nice time saver. That would at the _very_ least spare the "recommendation" at http://spark.apache.org/docs/latest/spark-standalone.html#starting-a-cluster-manually which is actually false (as the master doesn't print out the URL to the console once started): _Once started, the master will print out a spark://HOST:PORT URL for itself, which you can use to connect workers to it, or pass as the “master” argument to SparkContext. You can also find this URL on the master’s web UI, which is http://localhost:8080 by default._ > Spark Standalone's master URL accepts URLs without port (assuming default > 7077) > --- > > Key: SPARK-11909 > URL: https://issues.apache.org/jira/browse/SPARK-11909 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Jacek Laskowski >Priority: Trivial > > It's currently impossible to use {{spark://localhost}} URL for Spark > Standalone's master. With the feature supported, it'd be less to know to get > started with the mode (and hence improve user friendliness). > I think no-port master URL should be supported and assume the default port > {{7077}}. > {code} > org.apache.spark.SparkException: Invalid master URL: spark://localhost > at > org.apache.spark.util.Utils$.extractHostPortFromSparkUrl(Utils.scala:2088) > at org.apache.spark.rpc.RpcAddress$.fromSparkURL(RpcAddress.scala:47) > at > org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) > at > org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at org.apache.spark.deploy.client.AppClient.(AppClient.scala:48) > at > org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.start(SparkDeploySchedulerBackend.scala:93) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) > at org.apache.spark.SparkContext.(SparkContext.scala:530) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4514) SparkContext localProperties does not inherit property updates across thread reuse
[ https://issues.apache.org/jira/browse/SPARK-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021185#comment-15021185 ] Richard W. Eggert II edited comment on SPARK-4514 at 11/22/15 8:54 PM: --- The unit test attached to this issue fails in master, but passes in https://github.com/apache/spark/pull/9264 , which is intended to fix SPARK-9026. was (Author: reggert1980): The unit test attached to this issue fails in master, but passes in https://github.com/apache/spark/pull/9264 > SparkContext localProperties does not inherit property updates across thread > reuse > -- > > Key: SPARK-4514 > URL: https://issues.apache.org/jira/browse/SPARK-4514 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Erik Erlandson >Assignee: Josh Rosen >Priority: Critical > > The current job group id of a Spark context is stored in the > {{localProperties}} member value. This data structure is designed to be > thread local, and its settings are not preserved when {{ComplexFutureAction}} > instantiates a new {{Future}}. > One consequence of this is that {{takeAsync()}} does not behave in the same > way as other async actions, e.g. {{countAsync()}}. For example, this test > (if copied into StatusTrackerSuite.scala), will fail, because > {{"my-job-group2"}} is not propagated to the Future which actually > instantiates the job: > {code:java} > test("getJobIdsForGroup() with takeAsync()") { > sc = new SparkContext("local", "test", new SparkConf(false)) > sc.setJobGroup("my-job-group2", "description") > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be (Seq.empty) > val firstJobFuture = sc.parallelize(1 to 1000, 1).takeAsync(1) > val firstJobId = eventually(timeout(10 seconds)) { > firstJobFuture.jobIds.head > } > eventually(timeout(10 seconds)) { > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be > (Seq(firstJobId)) > } > } > {code} > It also impacts current PR for SPARK-1021, which involves additional uses of > {{ComplexFutureAction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4514) SparkContext localProperties does not inherit property updates across thread reuse
[ https://issues.apache.org/jira/browse/SPARK-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021187#comment-15021187 ] Richard W. Eggert II commented on SPARK-4514: - This test, however, still fails: {code} test("getJobIdsForGroup() with takeAsync() across multiple partitions") { sc = new SparkContext("local", "test", new SparkConf(false)) sc.setJobGroup("my-job-group2", "description") sc.statusTracker.getJobIdsForGroup("my-job-group2") shouldBe empty val firstJobFuture = sc.parallelize(1 to 1000, 2).takeAsync(999) val firstJobId = eventually(timeout(10 seconds)) { firstJobFuture.jobIds.head } eventually(timeout(10 seconds)) { sc.statusTracker.getJobIdsForGroup("my-job-group2") should have size 2 } } {code} > SparkContext localProperties does not inherit property updates across thread > reuse > -- > > Key: SPARK-4514 > URL: https://issues.apache.org/jira/browse/SPARK-4514 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Erik Erlandson >Assignee: Josh Rosen >Priority: Critical > > The current job group id of a Spark context is stored in the > {{localProperties}} member value. This data structure is designed to be > thread local, and its settings are not preserved when {{ComplexFutureAction}} > instantiates a new {{Future}}. > One consequence of this is that {{takeAsync()}} does not behave in the same > way as other async actions, e.g. {{countAsync()}}. For example, this test > (if copied into StatusTrackerSuite.scala), will fail, because > {{"my-job-group2"}} is not propagated to the Future which actually > instantiates the job: > {code:java} > test("getJobIdsForGroup() with takeAsync()") { > sc = new SparkContext("local", "test", new SparkConf(false)) > sc.setJobGroup("my-job-group2", "description") > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be (Seq.empty) > val firstJobFuture = sc.parallelize(1 to 1000, 1).takeAsync(1) > val firstJobId = eventually(timeout(10 seconds)) { > firstJobFuture.jobIds.head > } > eventually(timeout(10 seconds)) { > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be > (Seq(firstJobId)) > } > } > {code} > It also impacts current PR for SPARK-1021, which involves additional uses of > {{ComplexFutureAction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4514) SparkContext localProperties does not inherit property updates across thread reuse
[ https://issues.apache.org/jira/browse/SPARK-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021194#comment-15021194 ] Richard W. Eggert II commented on SPARK-4514: - I implemented a two-line fix that causes this test to now pass in that PR. > SparkContext localProperties does not inherit property updates across thread > reuse > -- > > Key: SPARK-4514 > URL: https://issues.apache.org/jira/browse/SPARK-4514 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Erik Erlandson >Assignee: Josh Rosen >Priority: Critical > > The current job group id of a Spark context is stored in the > {{localProperties}} member value. This data structure is designed to be > thread local, and its settings are not preserved when {{ComplexFutureAction}} > instantiates a new {{Future}}. > One consequence of this is that {{takeAsync()}} does not behave in the same > way as other async actions, e.g. {{countAsync()}}. For example, this test > (if copied into StatusTrackerSuite.scala), will fail, because > {{"my-job-group2"}} is not propagated to the Future which actually > instantiates the job: > {code:java} > test("getJobIdsForGroup() with takeAsync()") { > sc = new SparkContext("local", "test", new SparkConf(false)) > sc.setJobGroup("my-job-group2", "description") > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be (Seq.empty) > val firstJobFuture = sc.parallelize(1 to 1000, 1).takeAsync(1) > val firstJobId = eventually(timeout(10 seconds)) { > firstJobFuture.jobIds.head > } > eventually(timeout(10 seconds)) { > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be > (Seq(firstJobId)) > } > } > {code} > It also impacts current PR for SPARK-1021, which involves additional uses of > {{ComplexFutureAction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org