[GitHub] spark pull request #18068: [SPARK-20764][ML][PySpark][FOLLOWUP]Fix visibilit...
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/18068#discussion_r117912054 --- Diff: python/pyspark/ml/tests.py --- @@ -1075,7 +1076,8 @@ def test_linear_regression_summary(self): pValues = s.pValues self.assertTrue(isinstance(pValues, list) and isinstance(pValues[0], float)) # test evaluation (with training dataset) produces a summary with same values -# one check is enough to verify a summary is returned, Scala version runs full test +# one check is enough to verify a summary is returned +# The child class LinearRegressionTrainingSummary runs full test --- End diff -- I think this is not because Scala version runs full test. Even Scala version runs full test, we still need the function call test. If a child class have done the function call test, we don't need to test parent class again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18068: [SPARK-20764][ML][PySpark][FOLLOWUP]Fix visibilit...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/18068#discussion_r117912043 --- Diff: python/pyspark/ml/tests.py --- @@ -1075,7 +1076,8 @@ def test_linear_regression_summary(self): pValues = s.pValues self.assertTrue(isinstance(pValues, list) and isinstance(pValues[0], float)) # test evaluation (with training dataset) produces a summary with same values -# one check is enough to verify a summary is returned, Scala version runs full test +# one check is enough to verify a summary is returned +# The child class LinearRegressionTrainingSummary runs full test --- End diff -- I'm not sure what this comment means? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18068: [SPARK-20764][ML][PySpark][FOLLOWUP]Fix visibility discr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18068 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77229/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18068: [SPARK-20764][ML][PySpark][FOLLOWUP]Fix visibility discr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18068 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18068: [SPARK-20764][ML][PySpark][FOLLOWUP]Fix visibility discr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18068 **[Test build #77229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77229/testReport)** for PR 18068 at commit [`7bbfe3a`](https://github.com/apache/spark/commit/7bbfe3a860964d166f67c3b099b00c8b11a73f9d). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class DecisionTreeClassifierWrapperWriter(instance: DecisionTreeClassifierWrapper)` * ` class DecisionTreeClassifierWrapperReader extends MLReader[DecisionTreeClassifierWrapper] ` * ` class DecisionTreeRegressorWrapperWriter(instance: DecisionTreeRegressorWrapper)` * ` class DecisionTreeRegressorWrapperReader extends MLReader[DecisionTreeRegressorWrapper] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18068: [SPARK-20764][ML][PySpark][FOLLOWUP]Fix visibility discr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18068 **[Test build #77231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77231/testReport)** for PR 18068 at commit [`013adc4`](https://github.com/apache/spark/commit/013adc4460c588e1e06a66d23ce66d864803554e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18058 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18058 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77228/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18058 **[Test build #77228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77228/testReport)** for PR 18058 at commit [`85882ae`](https://github.com/apache/spark/commit/85882aeda99e9407fed82fe7fef79adcb886). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class HasNumPartitions(Params):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/18025 Great point. - For a method that is defined in one class and belongs in a group like `cov`, we can document it in its own Rd, and add a link to in the `SeeAlso` section of the group doc. In this case, the `\alias{cov}` will be in `cov.Rd`. - For a method that is defined for multiple classes but meaning are drastically different: I think we can still document them in one Rd, and add a `details` section to describe the method for each class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18067: [SPARK-20849][DOC][SPARKR] Document R DecisionTree
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18067 **[Test build #77230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77230/testReport)** for PR 18067 at commit [`65cf494`](https://github.com/apache/spark/commit/65cf494a0f432c23ea83bc532942bb9c84febaaa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18068: [SPARK-20764][ML][PySpark][FOLLOWUP]Fix visibility discr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18068 **[Test build #77229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77229/testReport)** for PR 18068 at commit [`7bbfe3a`](https://github.com/apache/spark/commit/7bbfe3a860964d166f67c3b099b00c8b11a73f9d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18059: [SPARK-20834][SQL]TypeCoercion:loss of precision when wi...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/18059 If user wants a precise result, why not use deciaml? float and double are both imprecise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18035: [MINOR][SPARKR][ML] Joint coefficients with intercept fo...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18035 let's ignore the appveyor intermitted error - since it passed before simple typo changes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18025 Thanks for summarizing. I think they make sense. To be clear though, we should also talk about: - what if a method is defined in one class and belongs in a group, but also defined for another class (eg. sql function: `cov`) - what if it is defined for multiple classes but meaning are drastically different (eg. coalesce(DF) and coalesce(col) in my example above) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17967: [SPARK-14659][ML] RFormula consistent with R when...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/17967#discussion_r117909338 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -37,6 +37,42 @@ import org.apache.spark.sql.types._ */ private[feature] trait RFormulaBase extends HasFeaturesCol with HasLabelCol { + /** + * Param for how to order categories of a string FEATURE column used by `StringIndexer`. + * The last category after ordering is dropped when encoding strings. + * Supported options: 'frequencyDesc', 'frequencyAsc', 'alphabetDesc', 'alphabetAsc'. + * The default value is 'frequencyDesc'. When the ordering is set to 'alphabetDesc', `RFormula` + * drops the same category as R when encoding strings. + * + * The options are explained using an example `'b', 'a', 'b', 'a', 'c', 'b'`: + * {{{ + * +-+---+--+ --- End diff -- @felixcheung @HyukjinKwon The scaladoc complied, but the javadoc failed... Not sure if there is additional config for java? ![image](https://cloud.githubusercontent.com/assets/11082368/26341144/048b8d6e-3f47-11e7-8600-c111643a0295.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18058 **[Test build #77228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77228/testReport)** for PR 18058 at commit [`85882ae`](https://github.com/apache/spark/commit/85882aeda99e9407fed82fe7fef79adcb886). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18068: [SPARK-20764][ML][PySpark][FOLLOWUP]Fix visibilit...
GitHub user mpjlu reopened a pull request: https://github.com/apache/spark/pull/18068 [SPARK-20764][ML][PySpark][FOLLOWUP]Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version ## What changes were proposed in this pull request? Add test cases for PR-18062 ## How was this patch tested? The existing UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/mpjlu/spark moreTest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18068.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18068 commit 6b31ec7dda73c155fc94c5ccf53709099f8033dd Author: Peng Date: 2017-05-22T11:37:50Z fix visibility of numInstances and degreesOfFreedom in LR and GLR - Python version commit a8b407f877269f235611e5dc5bb338c421206a57 Author: Peng Date: 2017-05-23T05:52:29Z follow up of SPARK-20764 commit 7bbfe3a860964d166f67c3b099b00c8b11a73f9d Author: Peng Date: 2017-05-23T05:58:52Z Merge remote-tracking branch 'origin/master' into moreTest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18067: [SPARK-20849][DOC][SPARKR] Document R DecisionTree
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18067 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18067: [SPARK-20849][DOC][SPARKR] Document R DecisionTree
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18067 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77226/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18067: [SPARK-20849][DOC][SPARKR] Document R DecisionTree
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18067 **[Test build #77226 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77226/testReport)** for PR 18067 at commit [`f43ebe0`](https://github.com/apache/spark/commit/f43ebe03115b0b22ed01b76925312dfbc7a2c8c0). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18068: [SPARK-20764][ML][PySpark][FOLLOWUP]Fix visibilit...
Github user mpjlu closed the pull request at: https://github.com/apache/spark/pull/18068 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18068: [SPARK-20764][ML][PySpark][FOLLOWUP]Fix visibilit...
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/18068 [SPARK-20764][ML][PySpark][FOLLOWUP]Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version ## What changes were proposed in this pull request? Add test cases for PR-18062 ## How was this patch tested? The existing UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/mpjlu/spark moreTest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18068.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18068 commit 6b31ec7dda73c155fc94c5ccf53709099f8033dd Author: Peng Date: 2017-05-22T11:37:50Z fix visibility of numInstances and degreesOfFreedom in LR and GLR - Python version commit a8b407f877269f235611e5dc5bb338c421206a57 Author: Peng Date: 2017-05-23T05:52:29Z follow up of SPARK-20764 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18058 Jenkins, ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18046: [SPARK-20749][SQL] Built-in SQL Function Support - all v...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18046 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18046: [SPARK-20749][SQL] Built-in SQL Function Support - all v...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18046 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77221/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18046: [SPARK-20749][SQL] Built-in SQL Function Support - all v...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18046 **[Test build #77221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77221/testReport)** for PR 18046 at commit [`e9acb63`](https://github.com/apache/spark/commit/e9acb63e1e695ddab4d80ed74844f2244c3f0e05). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18058 There seems something wrong with CI. I saw the same non-response/delay of CI once again since last month. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r117907961 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -510,6 +510,69 @@ public UTF8String trim() { } } + /** + * Removes the given trim string from both ends of a string + * @param trimString the trim character string + */ + public UTF8String trim(UTF8String trimString) { +// This method searches for each character in the source string, removes the character if it is found +// in the trim string, stops at the first not found. It starts from left end, then right end. +// It returns a new string in which both ends trim characters have been removed. +int s = 0; // the searching byte position of the input string +int i = 0; // the first beginning byte position of a non-matching character +int e = 0; // the last byte position +int numChars = 0; // number of characters from the input string +int[] stringCharLen = new int[numBytes]; // array of character length for the input string +int[] stringCharPos = new int[numBytes]; // array of the first byte position for each character in the input string +int searchCharBytes; + +while (s < this.numBytes) { + UTF8String searchChar = copyUTF8String(s, s + numBytesForFirstByte(this.getByte(s)) - 1); + searchCharBytes = searchChar.numBytes; + // try to find the matching for the searchChar in the trimString set + if (trimString.find(searchChar, 0) >= 0) { +i += searchCharBytes; + } else { +// no matching, exit the search +break; + } + s += searchCharBytes; +} + +if (i >= this.numBytes) { + // empty string + return UTF8String.EMPTY_UTF8; +} else { + //build the position and length array + s = 0; + while (s < numBytes) { +stringCharPos[numChars] = s; +stringCharLen[numChars]= numBytesForFirstByte(getByte(s)); --- End diff -- > I was thinking that these two arrays are only used by trimRight, in the case trimLeft trim all the source string, then we don't need to do the trimRight, so it will save some performance. Yeah I agree with you. I just think `numBytesForFirstByte` is called twice for beginning matched chars. But it seems easier to extract methods based on current implementation. Let's keep `stringCharPos` and `stringCharLen` only in "trimRight" part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17308: [SPARK-19968][SPARK-20737][SS] Use a cached instance of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17308 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77227/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17308: [SPARK-19968][SPARK-20737][SS] Use a cached instance of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17308 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17308: [SPARK-19968][SPARK-20737][SS] Use a cached instance of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17308 **[Test build #77227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77227/testReport)** for PR 17308 at commit [`15dfc80`](https://github.com/apache/spark/commit/15dfc80a8a35208f5f9df150de7c4bd9a015e2d8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18058 @srowen @MLnick Could you help to add @facaiy to whitelist? It seems we can't trigger this job currently. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17967: [SPARK-14659][ML] RFormula consistent with R when...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17967#discussion_r117906884 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -37,6 +37,42 @@ import org.apache.spark.sql.types._ */ private[feature] trait RFormulaBase extends HasFeaturesCol with HasLabelCol { + /** + * Param for how to order categories of a string FEATURE column used by `StringIndexer`. + * The last category after ordering is dropped when encoding strings. + * Supported options: 'frequencyDesc', 'frequencyAsc', 'alphabetDesc', 'alphabetAsc'. + * The default value is 'frequencyDesc'. When the ordering is set to 'alphabetDesc', `RFormula` + * drops the same category as R when encoding strings. + * + * The options are explained using an example `'b', 'a', 'b', 'a', 'c', 'b'`: + * {{{ + * +-+---+--+ --- End diff -- according to this, table is https://wiki.scala-lang.org/display/SW/Syntax --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17967: [SPARK-14659][ML] RFormula consistent with R when...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17967#discussion_r117906723 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -37,6 +37,42 @@ import org.apache.spark.sql.types._ */ private[feature] trait RFormulaBase extends HasFeaturesCol with HasLabelCol { + /** + * Param for how to order categories of a string FEATURE column used by `StringIndexer`. + * The last category after ordering is dropped when encoding strings. + * Supported options: 'frequencyDesc', 'frequencyAsc', 'alphabetDesc', 'alphabetAsc'. + * The default value is 'frequencyDesc'. When the ordering is set to 'alphabetDesc', `RFormula` + * drops the same category as R when encoding strings. + * + * The options are explained using an example `'b', 'a', 'b', 'a', 'c', 'b'`: + * {{{ + * +-+---+--+ --- End diff -- it's suppose to work with raw html tag? I'm not sure why `` works but `` doesn't... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18067: [SPARK-20849][DOC][SPARKR] Document R DecisionTre...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18067#discussion_r117906523 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -776,6 +778,19 @@ newDF <- createDataFrame(data.frame(x = c(1.5, 3.2))) head(predict(isoregModel, newDF)) ``` + Decision Tree + +`spark.decisionTree` fits a [decision tree](https://en.wikipedia.org/wiki/Decision_tree_learning) classification or regression model on a `SparkDataFrame`. +Users can call `summary` to get a summary of the fitted model, `predict` to make predictions, and `write.ml`/`read.ml` to save/load fitted models. + +We use the `longley` dataset to train a decision tree and make predictions: + +```{r, warning=FALSE} +df <- createDataFrame(longley) --- End diff -- I'd say try to use a data set without `.` in column name if you can. Probably would be confusion when examples are causing warnings when users run them --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r117906408 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -510,6 +510,69 @@ public UTF8String trim() { } } + /** + * Removes the given trim string from both ends of a string + * @param trimString the trim character string + */ + public UTF8String trim(UTF8String trimString) { +// This method searches for each character in the source string, removes the character if it is found +// in the trim string, stops at the first not found. It starts from left end, then right end. +// It returns a new string in which both ends trim characters have been removed. +int s = 0; // the searching byte position of the input string +int i = 0; // the first beginning byte position of a non-matching character +int e = 0; // the last byte position +int numChars = 0; // number of characters from the input string +int[] stringCharLen = new int[numBytes]; // array of character length for the input string +int[] stringCharPos = new int[numBytes]; // array of the first byte position for each character in the input string +int searchCharBytes; + +while (s < this.numBytes) { + UTF8String searchChar = copyUTF8String(s, s + numBytesForFirstByte(this.getByte(s)) - 1); + searchCharBytes = searchChar.numBytes; + // try to find the matching for the searchChar in the trimString set + if (trimString.find(searchChar, 0) >= 0) { +i += searchCharBytes; + } else { +// no matching, exit the search +break; + } + s += searchCharBytes; +} + +if (i >= this.numBytes) { + // empty string + return UTF8String.EMPTY_UTF8; +} else { + //build the position and length array + s = 0; + while (s < numBytes) { +stringCharPos[numChars] = s; +stringCharLen[numChars]= numBytesForFirstByte(getByte(s)); +s += stringCharLen[numChars]; --- End diff -- um, I'm also thinking about the performance difference. Let's keep it unchanged for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17966: [SPARK-20727] Skip tests that use Hadoop utils on...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17966 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17966: [SPARK-20727] Skip tests that use Hadoop utils on CRAN W...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17966 merged to master/2.2 I think we should still check win-builder. Also it's a bit hard to tell if the skipped tests are skipped - might want to follow up with a trace --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17864 Ping folks for comments/review. Many thanks. @viirya @MLnick @jkbradley @hhbyyh @yanboliang @BenFradet --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18048: [SPARK-20399][SQL][Follow-up] Add a config to fallback s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18048 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18048: [SPARK-20399][SQL][Follow-up] Add a config to fallback s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18048 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77218/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18048: [SPARK-20399][SQL][Follow-up] Add a config to fallback s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18048 **[Test build #77218 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77218/testReport)** for PR 18048 at commit [`9af9caf`](https://github.com/apache/spark/commit/9af9caf20f46674eabee2c0ece5ae828d2426a5d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17308: [SPARK-19968][SPARK-20737][SS] Use a cached instance of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17308 **[Test build #77227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77227/testReport)** for PR 17308 at commit [`15dfc80`](https://github.com/apache/spark/commit/15dfc80a8a35208f5f9df150de7c4bd9a015e2d8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/18025 @felixcheung I think we may want to distinguish a few cases: 1. For methods that are mainly defined by only one class, e.g., most function methods for Column, it makes sense to group and document them together. For example, most aggregate functions of Column go into one single Rd, since they are not defined for other classes. In this case, `avg` will go to this doc since it is not used by other classes. 2. For methods that are defined by multiple classes, e.g., the `show` method defined for SparkDataFrame, GroupedData, Column and StreamingQuery, we can still document them in `show.Rd`. In this case, `show` will go to this doc and shows the help for all classes that have defined a `show` method. 3. When it makes sense, we can also combine 1 & 2 above. For example, `gapply` and `gapplyCollecte` are defined for both SparkDataFrame and GroupedData. But we can still document them together and create shared examples. Let me know if this makes sense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18067: [SPARK-20849][DOC][SPARKR] Document R DecisionTree
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18067 **[Test build #77226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77226/testReport)** for PR 18067 at commit [`f43ebe0`](https://github.com/apache/spark/commit/f43ebe03115b0b22ed01b76925312dfbc7a2c8c0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18067: [SPARK-20849][DOC][SPARKR] Document R DecisionTre...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/18067 [SPARK-20849][DOC][SPARKR] Document R DecisionTree ## What changes were proposed in this pull request? 1, add an example for sparkr `decisionTree` 2, document it in user guide ## How was this patch tested? local submit You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark dt_example Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18067.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18067 commit 3d8172f98f0994fec9ff359dfca4e6fcddd85863 Author: Zheng RuiFeng Date: 2017-05-23T03:56:20Z create pr commit def3ef4635094955c20c7e9511ce681378794d34 Author: Zheng RuiFeng Date: 2017-05-23T04:33:33Z update vignettes commit f43ebe03115b0b22ed01b76925312dfbc7a2c8c0 Author: Zheng RuiFeng Date: 2017-05-23T05:44:44Z update sparkr.md --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17308: [SPARK-19968][SPARK-20737][SS] Use a cached instance of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17308 **[Test build #77225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77225/testReport)** for PR 17308 at commit [`ef2d6cd`](https://github.com/apache/spark/commit/ef2d6cd4275d93518ec27d4b08916575a3e597d7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18064 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18064 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77220/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18064 **[Test build #77220 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77220/testReport)** for PR 18064 at commit [`b355c6d`](https://github.com/apache/spark/commit/b355c6d034c6aefcf8f74757353afce870e9bf1d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait Command extends LogicalPlan ` * `case class ExecutedCommandExec(cmd: RunnableCommand, children: Seq[SparkPlan]) extends SparkPlan ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16989 **[Test build #77224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77224/testReport)** for PR 16989 at commit [`e022b6d`](https://github.com/apache/spark/commit/e022b6d4ccab0f7fc7b47a468b23046a11576311). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17698: [SPARK-20403][SQL]Modify the instructions of some functi...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/17698 @10110346 Hi, you can use the command @gatorsmile mentioned above to generate the result file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18066: [SPARK-20822][SQL] Generate code to build table cache us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18066 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77222/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18066: [SPARK-20822][SQL] Generate code to build table cache us...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18066 **[Test build #77222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77222/testReport)** for PR 18066 at commit [`6ed3d3f`](https://github.com/apache/spark/commit/6ed3d3fa51cd9b09e2f137bda87dcb16e5a9fb1a). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class GenerateColumnAccessor(useColumnarBatch: Boolean)` * `class GenerateColumnarBatch( ` * ` class GeneratedColumnarBatchIterator extends $` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18066: [SPARK-20822][SQL] Generate code to build table cache us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18066 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18064 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77219/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18064 **[Test build #77219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77219/testReport)** for PR 18064 at commit [`9507f19`](https://github.com/apache/spark/commit/9507f1938f894b2884b024c8472084a3a531e20d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait Command extends LogicalPlan ` * `case class ExecutedCommandExec(cmd: RunnableCommand, children: Seq[SparkPlan]) extends SparkPlan ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18064 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16989 **[Test build #77223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77223/testReport)** for PR 16989 at commit [`9b733ec`](https://github.com/apache/spark/commit/9b733ec0fbc4bad8fc7f2413af1be5c6f718d9c1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18066: [SPARK-20822][SQL] Generate code to build table cache us...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18066 **[Test build #77222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77222/testReport)** for PR 18066 at commit [`6ed3d3f`](https://github.com/apache/spark/commit/6ed3d3fa51cd9b09e2f137bda87dcb16e5a9fb1a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14957: [SPARK-4502][SQL]Support parquet nested struct pruning a...
Github user Gauravshah commented on the issue: https://github.com/apache/spark/pull/14957 @saulshanabrook looks like #16578 is a superset, trying to invest in that pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18066: [SPARK-20822][SQL] Generate code to build table c...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/18066 [SPARK-20822][SQL] Generate code to build table cache using ColumnarBatch and to get value from ColumnVector ## What changes were proposed in this pull request? This PR generates the following Java code 1. Build each in-memory table cache using `ColumnarBatch` with `ColumnVector` instead of using CachedBatch with `Array[Byte]`. 2. Get a value for a column in `ColumnVector without using an iterator As the first step, for ease of review, I supported only integer and double data types with whole-stage codegen. Another PR will address an execution path without whole-stage codegen This PR implements the follings: 1. Keep a in-memory table cache using `ColumnarBatch` with `ColumnVector`. For supporting the new and coventional cache data structure, this PR declares `CachedBatch` as trait, and declares `CachedColumnarBatch` and `CachedBatchBytes` as actual implementations. 2. Generate Java code to build a in-memory table cache. 3. Generate Java code to directly get value from `ColumnVector`. This PR improves runtime performance by 1. build in-memory table cache by eliminating lots of virtual calls and complicated data path. 2. eliminating data copy from column-oriented storage to `InternalRow` in a `SpecificColumnarIterator` iterator. **Options** A ColumnVector for all primitive data types in ColumnarBatch can be compressed. Currently, there are two ways to enable compression: 1. Set true into a property `spark.sql.inMemoryColumnarStorage.compressed (default is true)`, or 2. Call `DataFrame.persist(st)`, where st is `MEMORY_ONLY_SER`, `MEMORY_ONLY_SER_2`, `MEMORY_AND_DISK_SER`, or `MEMORY_AND_DISK_SER_2`. **an example program** ```java val df = sparkContext.parallelize((1 to 10), 1).map(i => (i, i.toDouble)).toDF("i", "d").cache df.filter("i < 8 and 4.0 < d").show ``` **Generated code for building a in-memory table cache** ``` /* 001 */ import scala.collection.Iterator; /* 002 */ import org.apache.spark.sql.types.DataType; /* 003 */ import org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder; /* 004 */ import org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter; /* 005 */ import org.apache.spark.sql.execution.columnar.MutableUnsafeRow; /* 006 */ import org.apache.spark.sql.execution.vectorized.ColumnVector; /* 007 */ /* 008 */ public SpecificColumnarIterator generate(Object[] references) { /* 009 */ return new SpecificColumnarIterator(references); /* 010 */ } /* 011 */ /* 012 */ class SpecificColumnarIterator extends org.apache.spark.sql.execution.columnar.ColumnarIterator { /* 013 */ private ColumnVector[] colInstances; /* 014 */ private UnsafeRow unsafeRow = new UnsafeRow(0); /* 015 */ private BufferHolder bufferHolder = new BufferHolder(unsafeRow); /* 016 */ private UnsafeRowWriter rowWriter = new UnsafeRowWriter(bufferHolder, 0); /* 017 */ private MutableUnsafeRow mutableRow = null; /* 018 */ /* 019 */ private int rowIdx = 0; /* 020 */ private int numRowsInBatch = 0; /* 021 */ /* 022 */ private scala.collection.Iterator input = null; /* 023 */ private DataType[] columnTypes = null; /* 024 */ private int[] columnIndexes = null; /* 025 */ /* 026 */ /* 027 */ /* 028 */ public SpecificColumnarIterator(Object[] references) { /* 029 */ /* 030 */ this.mutableRow = new MutableUnsafeRow(rowWriter); /* 031 */ } /* 032 */ /* 033 */ public void initialize(Iterator input, DataType[] columnTypes, int[] columnIndexes) { /* 034 */ this.input = input; /* 035 */ this.columnTypes = columnTypes; /* 036 */ this.columnIndexes = columnIndexes; /* 037 */ } /* 038 */ /* 039 */ /* 040 */ /* 041 */ public boolean hasNext() { /* 042 */ if (rowIdx < numRowsInBatch) { /* 043 */ return true; /* 044 */ } /* 045 */ if (!input.hasNext()) { /* 046 */ return false; /* 047 */ } /* 048 */ /* 049 */ org.apache.spark.sql.execution.columnar.CachedColumnarBatch cachedBatch = /* 050 */ (org.apache.spark.sql.execution.columnar.CachedColumnarBatch) input.next(); /* 051 */ org.apache.spark.sql.execution.vectorized.ColumnarBatch batch = cachedBatch.columnarBatch(); /* 052 */ rowIdx = 0; /* 053 */ numRowsInBatch = cachedBatch.getNumRows(); /* 054 */ colInstances = new ColumnVector[columnIndexes.length]; /* 055 */ for (int i = 0; i < columnIndexes.length; i ++) { /* 056 */ colInstances[i] = batch.column(columnIndexes[i]); /* 057 */ } /* 058 */ /* 059 */ return hasNext(); /* 060 */ } /* 061 */ /* 062 */ public InternalRo
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18058 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18051: [SPARK-18825][SPARKR][DOCS][WIP] Eliminate duplicate lin...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/18051 Maybe I'm missing something completely, but I still don't get the point why we are removing the `xx-method` link since we are defining methods as S4 using `setMethod`. Lots of packages have these entries in the index. Below is a snapshot from the `sp` package. You can find a lot more there. ![image](https://cloud.githubusercontent.com/assets/11082368/26338918/e8bdd65e-3f38-11e7-83ef-c3293bc267a0.png) Even for S3 methods, they tend to repeat as well. Below is a snapshot of the `gamm4` package. ![image](https://cloud.githubusercontent.com/assets/11082368/26338937/10432bac-3f39-11e7-9b91-5774e33ff7f8.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18033: [SPARK-20807][SQL] Add compression/decompression of colu...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18033 @hvanhovell would it be possible to review this or let us know the appropriate persons for this review? cc @sameeragarwal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17698: [SPARK-20403][SQL]Modify the instructions of some functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17698 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77217/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17698: [SPARK-20403][SQL]Modify the instructions of some functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17698 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17698: [SPARK-20403][SQL]Modify the instructions of some functi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17698 **[Test build #77217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77217/testReport)** for PR 17698 at commit [`10be7eb`](https://github.com/apache/spark/commit/10be7eb586dcf992af2982ba94aa446408ad1e25). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18040: [SPARK-20815] [SPARKR] NullPointerException in RP...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18040 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18040: [SPARK-20815] [SPARKR] NullPointerException in RPackageU...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18040 merged to master/2.2, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18051: [SPARK-18825][SPARKR][DOCS][WIP] Eliminate duplicate lin...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18051 @actuaryzhang - we were just talking this in the other PR. what do you think? @zero323 - right, I do agree `?abs-method` is kind of a big problem... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18046: [SPARK-20749][SQL] Built-in SQL Function Support - all v...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18046 **[Test build #77221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77221/testReport)** for PR 18046 at commit [`e9acb63`](https://github.com/apache/spark/commit/e9acb63e1e695ddab4d80ed74844f2244c3f0e05). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18064 **[Test build #77220 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77220/testReport)** for PR 18064 at commit [`b355c6d`](https://github.com/apache/spark/commit/b355c6d034c6aefcf8f74757353afce870e9bf1d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18064 **[Test build #77219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77219/testReport)** for PR 18064 at commit [`9507f19`](https://github.com/apache/spark/commit/9507f1938f894b2884b024c8472084a3a531e20d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17967: [SPARK-14659][ML] RFormula consistent with R when...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/17967#discussion_r117892629 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -37,6 +37,42 @@ import org.apache.spark.sql.types._ */ private[feature] trait RFormulaBase extends HasFeaturesCol with HasLabelCol { + /** + * Param for how to order categories of a string FEATURE column used by `StringIndexer`. + * The last category after ordering is dropped when encoding strings. + * Supported options: 'frequencyDesc', 'frequencyAsc', 'alphabetDesc', 'alphabetAsc'. + * The default value is 'frequencyDesc'. When the ordering is set to 'alphabetDesc', `RFormula` + * drops the same category as R when encoding strings. + * + * The options are explained using an example `'b', 'a', 'b', 'a', 'c', 'b'`: + * {{{ + * +-+---+--+ --- End diff -- @HyukjinKwon Thanks for the clarification. I don't think `list` paints a clear picture here. Would rather keep the table structure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @yanboliang I updated the example in the param doc. I hope it is clear now that it is `alphabetDesc` that drops the same category as R. That is, RFormula with `alphabetDesc` drops the first alphabetic category in string encoding. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18048: [SPARK-20399][SQL][Follow-up] Add a config to fallback s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18048 **[Test build #77218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77218/testReport)** for PR 18048 at commit [`9af9caf`](https://github.com/apache/spark/commit/9af9caf20f46674eabee2c0ece5ae828d2426a5d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18048: [SPARK-20399][SQL][Follow-up] Add a config to fallback s...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18048 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18064 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77215/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18064 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18064 **[Test build #77215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77215/testReport)** for PR 18064 at commit [`5486950`](https://github.com/apache/spark/commit/5486950edada8ae87d2586f3f6d1e2d82027b015). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait Command extends LogicalPlan ` * `case class ExecutedCommandExec(cmd: RunnableCommand, children: Seq[SparkPlan]) extends SparkPlan ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17762: [SPARK-9103][WIP] Track Netty memory usage - take...
Github user jsoltren closed the pull request at: https://github.com/apache/spark/pull/17762 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17762: [SPARK-9103][WIP] Track Netty memory usage - take two
Github user jsoltren commented on the issue: https://github.com/apache/spark/pull/17762 To close the loop here: I'm going to rework these ideas into a new JIRA that I'll file, to track *total* memory usage in the UI. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17698: [SPARK-20403][SQL]Modify the instructions of some functi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17698 **[Test build #77217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77217/testReport)** for PR 17698 at commit [`10be7eb`](https://github.com/apache/spark/commit/10be7eb586dcf992af2982ba94aa446408ad1e25). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18058 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18058 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17698: [SPARK-20403][SQL]Modify the instructions of some functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17698 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17698: [SPARK-20403][SQL]Modify the instructions of some functi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17698 **[Test build #77214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77214/testReport)** for PR 17698 at commit [`7edfed5`](https://github.com/apache/spark/commit/7edfed5577e8610b4ba42f64979c4168fce829d5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17698: [SPARK-20403][SQL]Modify the instructions of some functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17698 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77214/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18035: [MINOR][SPARKR][ML] Joint coefficients with intercept fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18035 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18035: [MINOR][SPARKR][ML] Joint coefficients with intercept fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18035 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77216/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18035: [MINOR][SPARKR][ML] Joint coefficients with intercept fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18035 **[Test build #77216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77216/testReport)** for PR 18035 at commit [`5d9afe0`](https://github.com/apache/spark/commit/5d9afe06b665464b06705d618a18a8032255fe1d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18058 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18058 Thanks, @yanboliang. Do you have any suggestion about testing the parameter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17698: [SPARK-20403][SQL]Modify the instructions of some functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17698 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17698: [SPARK-20403][SQL]Modify the instructions of some functi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17698 **[Test build #77213 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77213/testReport)** for PR 17698 at commit [`6ce4220`](https://github.com/apache/spark/commit/6ce4220bf861f4a64f3126f1f14043dcb666a056). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17698: [SPARK-20403][SQL]Modify the instructions of some functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17698 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77213/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18048: [SPARK-20399][SQL][Follow-up] Add a config to fallback s...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18048 ping @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18035: [MINOR][SPARKR][ML] Joint coefficients with intercept fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18035 **[Test build #77216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77216/testReport)** for PR 18035 at commit [`5d9afe0`](https://github.com/apache/spark/commit/5d9afe06b665464b06705d618a18a8032255fe1d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org