[GitHub] spark pull request: [SPARK-12659] fix NPE in UnsafeExternalSorter ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10606#discussion_r48912881 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java --- @@ -223,14 +227,9 @@ public void loadNext() { * {@code next()} will return the same mutable object. */ public SortedIterator getSortedIterator() { -sorter.sort(array, 0, pos / 2, sortComparator); -return new SortedIterator(pos / 2); - } - - /** - * Returns an iterator over record pointers in original order (inserted). - */ - public SortedIterator getIterator() { +if (sortComparator != null) { + sorter.sort(array, 0, pos / 2, sortComparator); --- End diff -- No sorting is needed, only spilling is needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12379][ML][MLLIB] Copy GBT implementati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10607#issuecomment-169171784 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48795/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3873] [core] Import ordering fixes.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10578#issuecomment-169172279 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12630][DOC] Update param descriptions
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/10598#discussion_r48913054 --- Diff: python/pyspark/mllib/classification.py --- @@ -94,16 +94,18 @@ class LogisticRegressionModel(LinearClassificationModel): Classification model trained using Multinomial/Binary Logistic Regression. -:param weights: Weights computed for every feature. -:param intercept: Intercept computed for this model. (Only used -in Binary Logistic Regression. In Multinomial Logistic -Regression, the intercepts will not be a single value, -so the intercepts will be part of the weights.) -:param numFeatures: the dimension of the features. -:param numClasses: the number of possible outcomes for k classes -classification problem in Multinomial Logistic Regression. -By default, it is binary logistic regression so numClasses -will be set to 2. +:param weights: + Weights computed for every feature. +:param intercept: + Intercept computed for this model. (Only used in Binary Logistic + Regression. In Multinomial Logistic Regression, the intercepts will not + be a single value, so the intercepts will be part of the weights.) +:param numFeatures: + the dimension of the features. --- End diff -- nit: capitalize the first word in the description sentence --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...
Github user vectorijk commented on the pull request: https://github.com/apache/spark/pull/10527#issuecomment-169172817 cc @cloud-fan @marmbrus @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12593][SQL][WIP] Converts resolved logi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10541#issuecomment-168982479 **[Test build #48763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48763/consoleFull)** for PR 10541 at commit [`1e50288`](https://github.com/apache/spark/commit/1e50288d6f956608b53554d31bd394bf919812e0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12539][SQL] support writing bucketed ta...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-168982645 **[Test build #48765 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48765/consoleFull)** for PR 10498 at commit [`3ff968b`](https://github.com/apache/spark/commit/3ff968b29d3852c92952454254ae6e1f7ba6599d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12573][SPARK-12574][SQL] Move SQL Parse...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10583#issuecomment-168984129 **[Test build #48766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48766/consoleFull)** for PR 10583 at commit [`e397370`](https://github.com/apache/spark/commit/e39737023920c3916ad8ed6e4d4b46072bfe4f7a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12618] [CORE] [STREAMING] [SQL] Clean u...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10570#issuecomment-168985619 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12618] [CORE] [STREAMING] [SQL] Clean u...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10570#issuecomment-168985593 **[Test build #48762 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48762/consoleFull)** for PR 10570 at commit [`bfaa1fa`](https://github.com/apache/spark/commit/bfaa1fa79430030d7315cd6530f3da86c0eb39e1). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12618] [CORE] [STREAMING] [SQL] Clean u...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10570#issuecomment-168985620 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48762/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10602#issuecomment-168986043 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...
Github user vijaykiran commented on a diff in the pull request: https://github.com/apache/spark/pull/10602#discussion_r48839934 --- Diff: python/pyspark/mllib/fpm.py --- @@ -130,15 +133,22 @@ def train(cls, data, minSupport=0.1, maxPatternLength=10, maxLocalProjDBSize=320 """ Finds the complete set of frequent sequential patterns in the input sequences of itemsets. -:param data: The input data set, each element contains a sequnce of itemsets. -:param minSupport: the minimal support level of the sequential pattern, any pattern appears -more than (minSupport * size-of-the-dataset) times will be output (default: `0.1`) -:param maxPatternLength: the maximal length of the sequential pattern, any pattern appears -less than maxPatternLength will be output. (default: `10`) -:param maxLocalProjDBSize: The maximum number of items (including delimiters used in -the internal storage format) allowed in a projected database before local -processing. If a projected database exceeds this size, another -iteration of distributed prefix growth is run. (default: `3200`) +:param data: + The input data set, each element contains a sequnce of itemsets. +:param minSupport: + The minimal support level of the sequential pattern, any pattern appears + more than (minSupport * size-of-the-dataset) times will be output. + default: `0.1`) --- End diff -- I think the format should be (default: `0.1`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...
Github user vijaykiran commented on a diff in the pull request: https://github.com/apache/spark/pull/10602#discussion_r48839858 --- Diff: python/pyspark/mllib/fpm.py --- @@ -68,11 +68,14 @@ def train(cls, data, minSupport=0.3, numPartitions=-1): """ Computes an FP-Growth model that contains frequent itemsets. -:param data: The input data set, each element contains a -transaction. -:param minSupport: The minimal support level (default: `0.3`). -:param numPartitions: The number of partitions used by -parallel FP-growth (default: same as input data). + :param data: + The input data set, each element contains a transaction. + :param minSupport: + The minimal support level + (default: `0.3`) + :param numPartitions:The number of partitions used by parallel FP-growth --- End diff -- You missed this one :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9571#issuecomment-168988730 **[Test build #48768 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48768/consoleFull)** for PR 9571 at commit [`d6fa568`](https://github.com/apache/spark/commit/d6fa568fab72a2c4d57ecfcd304d000379534990). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...
GitHub user somideshmukh opened a pull request: https://github.com/apache/spark/pull/10602 [SPARK-12632][Python][Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation] Made changes in FPM file ,Recommendation file doesnot contain param changes You can merge this pull request into a Git repository by running: $ git pull https://github.com/somideshmukh/spark Branch12632-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10602.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10602 commit 5b53e88794ecb7c9a8a7f8b68aa8a3fb7c3ac7e3 Author: somideshmukhDate: 2016-01-05T12:18:51Z [SPARK-12632][Python][Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11315] [YARN] Add YARN extension servic...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8744#issuecomment-168988728 **[Test build #48769 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48769/consoleFull)** for PR 8744 at commit [`90a91c9`](https://github.com/apache/spark/commit/90a91c987bbeeb36bd0af36f743871eeb05fa5e4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1537] [YARN] Add history provider for Y...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10545#issuecomment-168990626 **[Test build #48767 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48767/consoleFull)** for PR 10545 at commit [`8d781db`](https://github.com/apache/spark/commit/8d781dbb4871383d43cd4d03776da5c617c6b0da). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [STREAMING][DOCS][EXAMPLES] Minor fixes
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/10603 [STREAMING][DOCS][EXAMPLES] Minor fixes You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark streaming-actor-custom-receiver Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10603.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10603 commit 5629feb2f43df706f0664b67c098d11a3c0b7185 Author: Jacek LaskowskiDate: 2016-01-05T12:55:25Z [STREAMING][DOCS][EXAMPLES] Minor fixes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11315] [YARN] Add YARN extension servic...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8744#issuecomment-168991865 **[Test build #48769 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48769/consoleFull)** for PR 8744 at commit [`90a91c9`](https://github.com/apache/spark/commit/90a91c987bbeeb36bd0af36f743871eeb05fa5e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11315] [YARN] Add YARN extension servic...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8744#issuecomment-168991988 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12504][SQL] Masking credentials in the ...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10452#issuecomment-169192724 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3873] [sql] Import ordering fixes.
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10573 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12504][SQL] Masking credentials in the ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10452 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12581][SQL] Support case-sensitive tabl...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10523#issuecomment-169193276 @yhuai Yes, quoted tables in postgres are always case-sensitive. We need to support case-insensitive table names? Table names in sparksql (`DataFrame#registerTempTable`) and typical databases such as oracle and mysql are also case-sensitive, so IMO we need to comply with the rule. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11696] [ML, MLlib] Optimization: Extend...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/9667#issuecomment-169193588 I see. Adding this seems reasonable since some spark.ml algorithms depend on these APIs. However, I want to avoid breaking the public optimization APIs in spark.mllib. (That should also let you make fewer corrections to the test suites and callers of the methods.) I'll make a few suggestions for that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11696] [ML, MLlib] Optimization: Extend...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/9667#discussion_r48920477 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -133,7 +133,7 @@ class GradientDescent private[spark] (private var gradient: Gradient, private va miniBatchFraction, initialWeights, convergenceTol) -weights +(weights, lossHistory.last, iter) --- End diff -- It'd be nice to return the whole loss history. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11696] [ML, MLlib] Optimization: Extend...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/9667#discussion_r48920506 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -178,7 +178,7 @@ object GradientDescent extends Logging { regParam: Double, miniBatchFraction: Double, initialWeights: Vector, - convergenceTol: Double): (Vector, Array[Double]) = { + convergenceTol: Double): (Vector, Array[Double], Integer) = { --- End diff -- Do you have to change the API here? The loss history should have length = num iterations, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11696] [ML, MLlib] Optimization: Extend...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/9667#discussion_r48920464 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -122,8 +122,8 @@ class GradientDescent private[spark] (private var gradient: Gradient, private va * @return solution vector */ @DeveloperApi - def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Vector = { -val (weights, _) = GradientDescent.runMiniBatchSGD( + def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): (Vector, Double, Integer) = { --- End diff -- This API should not be changed. You could add a new method (```optimizeWithStats```?) which returns the 3 values, and then share the implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12644][SQL] Update parquet reader to be...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10593#discussion_r48920875 --- Diff: core/src/test/scala/org/apache/spark/Benchmark.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import scala.collection.mutable + +import org.apache.commons.lang3.SystemUtils +import org.apache.spark.util.Utils + +/** + * Utility class to benchmark components. An example of how to use this is: + * val benchmark = new Benchmark("My Benchmark", valuesPerIteration) + * benchmark.addCase("V1", ") + * benchmark.addCase("V2", ") + * benchmark.run + * This will output the average time to run each function and the rate of each function. + * + * The benchmark function takes one argument that is the iteration that's being run + */ +class Benchmark(name: String, valuesPerIteration: Long, iters: Int = 5) { + val benchmarks = mutable.ArrayBuffer.empty[Benchmark.Case] + + def addCase(name: String, f: Int => Unit): Unit = { +benchmarks += Benchmark.Case(name, f) + } + + /** + * Runs the benchmark and outputs the results to stdout. This should be copied and added as + * a comment with the benchmark. Although the results vary from machine to machine, it should + * provide some baseline. + */ + def run(): Unit = { +require(benchmarks.nonEmpty) +val results = benchmarks.map { c => + Benchmark.measure(valuesPerIteration, c.fn, iters) +} +val firstRate = results.head.avgRate +// scalastyle:off +// The results are going to be processor specific so it is useful to include that. +println(Benchmark.getProcessorName()) +printf("%-30s %16s %16s %14s\n", name + ":", "Avg Time(ms)", "Avg Rate(M/s)", "Relative Rate") + println("---") +results.zip(benchmarks).foreach { r => + printf("%-30s %16s %16s %14s\n", r._2.name, r._1.avgMs.toString, "%10.2f" format r._1.avgRate, +"%6.2f X" format (r._1.avgRate / firstRate)) +} +println +// scalastyle:on + } +} + +object Benchmark { + case class Case(name: String, fn: Int => Unit) + case class Result(avgMs: Double, avgRate: Double) + + /** + * This should return a user helpful processor information. Getting at this depends on the OS. + * This should return something like "Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz" + */ + def getProcessorName(): String = { +if (SystemUtils.IS_OS_MAC_OSX) { + Utils.executeAndGetOutput(Seq("/usr/sbin/sysctl", "-n", "machdep.cpu.brand_string")) +} else if (SystemUtils.IS_OS_LINUX) { + Utils.executeAndGetOutput(Seq("/usr/bin/grep", "-m", "1", "\"model name\"", "/proc/cpuinfo")) +} else { + System.getenv("PROCESSOR_IDENTIFIER") +} + } + + /** + * Runs a single function `f` for iters, returning the average time the function took and + * the rate of the function. + */ + def measure(num: Long, f: Int => Unit, iters: Int): Result = { +var totalTime = 0L +for (i <- 0 until iters + 1) { + val start = System.currentTimeMillis() --- End diff -- How about calling System.nanoTime() for short-running benchmarks instead of System.currentTimeMillis()? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12573][SPARK-12574][SQL] Move SQL Parse...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10583#issuecomment-169196055 cc @cloud-fan can you take a look at this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12591][Streaming]Register OpenHashMapBa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10609#issuecomment-169196126 **[Test build #48808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48808/consoleFull)** for PR 10609 at commit [`0228eef`](https://github.com/apache/spark/commit/0228eef185e379e80cd3622194e785187f673bce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11531] [ML] : SparseVector error Msg
Github user rekhajoshm commented on the pull request: https://github.com/apache/spark/pull/9525#issuecomment-169196705 Thanks @jkbradley might have missed it or thought it was under discussion.updated.thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12640][SQL] Add simple benchmarking uti...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10589#discussion_r48921242 --- Diff: core/src/test/scala/org/apache/spark/Benchmark.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import scala.collection.mutable + +import org.apache.commons.lang3.SystemUtils +import org.apache.spark.util.Utils + +/** + * Utility class to benchmark components. An example of how to use this is: + * val benchmark = new Benchmark("My Benchmark", valuesPerIteration) + * benchmark.addCase("V1", ") + * benchmark.addCase("V2", ") + * benchmark.run + * This will output the average time to run each function and the rate of each function. + * + * The benchmark function takes one argument that is the iteration that's being run + */ +class Benchmark(name: String, valuesPerIteration: Long, iters: Int = 5) { + val benchmarks = mutable.ArrayBuffer.empty[Benchmark.Case] + + def addCase(name: String, f: Int => Unit): Unit = { +benchmarks += Benchmark.Case(name, f) + } + + /** + * Runs the benchmark and outputs the results to stdout. This should be copied and added as + * a comment with the benchmark. Although the results vary from machine to machine, it should + * provide some baseline. + */ + def run(): Unit = { +require(benchmarks.nonEmpty) +val results = benchmarks.map { c => + Benchmark.measure(valuesPerIteration, c.fn, iters) +} +val firstRate = results.head.avgRate +// scalastyle:off +// The results are going to be processor specific so it is useful to include that. +println(Benchmark.getProcessorName()) +printf("%-24s %16s %16s %14s\n", name + ":", "Avg Time(ms)", "Avg Rate(M/s)", "Relative Rate") + println("-") +results.zip(benchmarks).foreach { r => + printf("%-24s %16s %16s %14s\n", r._2.name, r._1.avgMs.toString, "%10.2f" format r._1.avgRate, +"%6.2f X" format (r._1.avgRate / firstRate)) +} +println +// scalastyle:on + } +} + +object Benchmark { + case class Case(name: String, fn: Int => Unit) + case class Result(avgMs: Double, avgRate: Double) + + /** + * This should return a user helpful processor information. Getting at this depends on the OS. + * This should return something like "Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz" + */ + def getProcessorName(): String = { +if (SystemUtils.IS_OS_MAC_OSX) { + Utils.executeAndGetOutput(Seq("/usr/sbin/sysctl", "-n", "machdep.cpu.brand_string")) +} else if (SystemUtils.IS_OS_LINUX) { + Utils.executeAndGetOutput(Seq("/usr/bin/grep", "-m", "1", "\"model name\"", "/proc/cpuinfo")) +} else { + System.getenv("PROCESSOR_IDENTIFIER") +} + } + + /** + * Runs a single function `f` for iters, returning the average time the function took and + * the rate of the function. + */ + def measure(num: Long, f: Int => Unit, iters: Int): Result = { +var totalTime = 0L +for (i <- 0 until iters + 1) { + val start = System.currentTimeMillis() --- End diff -- How about the calling System.nanoTime() for short-running benchmarks instead of calling System.currentTimeMillis()? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12573][SPARK-12574][SQL] Move SQL Parse...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/10583#discussion_r48921238 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -587,6 +586,13 @@ class Analyzer( case other => other } } + case u @ UnresolvedGenerator(name, children) => --- End diff -- Do we need to add `UnresolvedGenerator`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3873] [core] Import ordering fixes.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10578#issuecomment-169197671 **[Test build #48799 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48799/consoleFull)** for PR 10578 at commit [`c7bee0a`](https://github.com/apache/spark/commit/c7bee0a3ba32adb4c348bbada71d163fc6770384). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3873] [core] Import ordering fixes.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10578#issuecomment-169198124 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3873] [core] Import ordering fixes.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10578#issuecomment-169198126 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48799/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-169198730 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12539][SQL] support writing bucketed ta...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10498#discussion_r48921611 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/bucket.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext} +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.sources.{OutputWriter, OutputWriterFactory, HadoopFsRelationProvider, HadoopFsRelation} +import org.apache.spark.sql.types.StructType + +/** + * A container for bucketing information. + * Bucketing is a technology for decomposing data sets into more manageable parts, and the number + * of buckets is fixed so it does not fluctuate with data. + * + * @param numBuckets number of buckets. + * @param bucketColumnNames the names of the columns that used to generate the bucket id. + * @param sortColumnNames the names of the columns that used to sort data in each bucket. + */ +private[sql] case class BucketSpec( +numBuckets: Int, +bucketColumnNames: Seq[String], +sortColumnNames: Seq[String]) + +private[sql] trait BucketedHadoopFsRelationProvider extends HadoopFsRelationProvider { --- End diff -- should we expose the bucket API to users so that they can implement data source supporting bucketing? cc @rxin @nongli --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-169200095 **[Test build #48809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48809/consoleFull)** for PR 10376 at commit [`7837b06`](https://github.com/apache/spark/commit/7837b0601299da5ba42d45e5279b9c1449a7d619). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org