[GitHub] spark issue #17912: [SPARK-20670] [ML] Simplify FPGrowth transform
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17912 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76635/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17912: [SPARK-20670] [ML] Simplify FPGrowth transform
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17912 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17912: [SPARK-20670] [ML] Simplify FPGrowth transform
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17912 **[Test build #76635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76635/testReport)** for PR 17912 at commit [`b9e3e47`](https://github.com/apache/spark/commit/b9e3e47706af2b9b09fa73101487d31a00779dc3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17885 Could you post the changes you made in the PR description and explain why it resolves PEP-0440? It might help more people understand the impacts of this PR by reading the PR description. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17910: [SPARK-20669][ML] LogisticRegression family should be ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17910 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17910: [SPARK-20669][ML] LogisticRegression family should be ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17910 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76633/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17910: [SPARK-20669][ML] LogisticRegression family should be ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17910 **[Test build #76633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76633/testReport)** for PR 17910 at commit [`33c0f9e`](https://github.com/apache/spark/commit/33c0f9e52c239a6067a535be9c0ce19772d32aef). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/17910#discussion_r115418289 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -526,7 +526,7 @@ class LogisticRegression @Since("1.2.0") ( case None => histogram.length } -val isMultinomial = $(family) match { +val isMultinomial = $(family).toLowerCase(Locale.ROOT) match { --- End diff -- I follow the style in `GeneralizedLinearRegression`. Lower the param in setter can simplify the codes, but it also change the output of coresponding getter. What is your opinion? @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/17910#discussion_r115418315 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -890,7 +890,7 @@ object LogisticRegression extends DefaultParamsReadable[LogisticRegression] { override def load(path: String): LogisticRegression = super.load(path) private[classification] val supportedFamilyNames = -Array("auto", "binomial", "multinomial").map(_.toLowerCase(Locale.ROOT)) +Array("auto", "binomial", "multinomial") --- End diff -- I am not sure about this. If we should keep `toLowerCase` here, we may also do this in `GeneralizedLinearRegression` and others --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17869: [SPARK-20609][CORE]Run the SortShuffleSuite unit ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17869#discussion_r115417588 --- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala --- @@ -774,6 +774,7 @@ class ALSCleanerSuite extends SparkFunSuite { } finally { Utils.deleteRecursively(localDir) Utils.deleteRecursively(checkpointDir) + Utils.clearLocalRootDirs() --- End diff -- Could we add before/after for each likewise? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17885 Are you referring to https://www.python.org/dev/peps/pep-0440/ ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17896: [SPARK-20373][SQL][SS] Batch queries with 'Datase...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/17896#discussion_r115418094 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2457,6 +2457,19 @@ object CleanupAliases extends Rule[LogicalPlan] { } /** + * Ignore event time watermark in batch query, which is only supported in Structured Streaming. + * TODO: add this rule into analyzer rule list. + */ +object CheckEventTimeWatermark extends Rule[LogicalPlan] { --- End diff -- I see. The current approach is good to me then. Could you rename it, please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15259: [SPARK-17685][SQL] Make SortMergeJoinExec's currentVars ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15259 **[Test build #76645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76645/testReport)** for PR 15259 at commit [`2bb54b5`](https://github.com/apache/spark/commit/2bb54b569fcaf3c431bf792f594c485064d3cd37). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15259: [SPARK-17685][SQL] Make SortMergeJoinExec's currentVars ...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/15259 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17885 If there are no other comments I'm going to merge this tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17303: [SPARK-19112][CORE] add codec for ZStandard
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17303 @Cyan4973 I quickly checked again; ``` scaleFactor: 4 AWS instance: c4.4xlarge // In this bench, I used `local-cluster` (`local` used in the benchmark above) ./bin/spark-shell --master local-cluster[4,4,7500] \ --conf spark.driver.memory=1g \ --conf spark.executor.memory=7g \ --conf spark.io.compression.codec=xxx --- zstd (level=3) Running execution q4-v1.4 iteration: 1, StandardRun=true Execution time: 36.517211838s Running execution q4-v1.4 iteration: 2, StandardRun=true Execution time: 25.026869575s Running execution q4-v1.4 iteration: 3, StandardRun=true Execution time: 24.370711575s --- zstd (level=1) Running execution q4-v1.4 iteration: 1, StandardRun=true Execution time: 29.654705815s Running execution q4-v1.4 iteration: 2, StandardRun=true Execution time: 20.638918335s Running execution q4-v1.4 iteration: 3, StandardRun=true Execution time: 19.92873075897s --- lz4 Running execution q4-v1.4 iteration: 1, StandardRun=true Execution time: 27.422360631s Running execution q4-v1.4 iteration: 2, StandardRun=true Execution time: 17.38519278s Running execution q4-v1.4 iteration: 3, StandardRun=true Execution time: 15.779084563s --- snappy Running execution q4-v1.4 iteration: 1, StandardRun=true Execution time: 27.47656952102s Running execution q4-v1.4 iteration: 2, StandardRun=true Execution time: 16.438640631s Running execution q4-v1.4 iteration: 3, StandardRun=true Execution time: 14.949329456s --- lzf Running execution q4-v1.4 iteration: 1, StandardRun=true Execution time: 27.853010073s Running execution q4-v1.4 iteration: 2, StandardRun=true Execution time: 17.43123253203s Running execution q4-v1.4 iteration: 3, StandardRun=true Execution time: 15.91656989699s ``` `zstd` was still worse than the others. Not sure though, there might be the winner case where `zstd` overcomes the others in more larger data set. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17858 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76644/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17858 **[Test build #76644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76644/testReport)** for PR 17858 at commit [`6b1b153`](https://github.com/apache/spark/commit/6b1b153e1ee9ec3e7830158d8f8eb274970929ae). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17858 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17879: [SPARK-20619][ML] StringIndexer supports multiple ways t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17879 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17879: [SPARK-20619][ML] StringIndexer supports multiple ways t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17879 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76628/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17858 **[Test build #76644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76644/testReport)** for PR 17858 at commit [`6b1b153`](https://github.com/apache/spark/commit/6b1b153e1ee9ec3e7830158d8f8eb274970929ae). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17879: [SPARK-20619][ML] StringIndexer supports multiple ways t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17879 **[Test build #76628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76628/testReport)** for PR 17879 at commit [`53381ea`](https://github.com/apache/spark/commit/53381ea6ba41cc26ed89a6fc42252f7126198d9f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17910#discussion_r115416085 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -526,7 +526,7 @@ class LogisticRegression @Since("1.2.0") ( case None => histogram.length } -val isMultinomial = $(family) match { +val isMultinomial = $(family).toLowerCase(Locale.ROOT) match { --- End diff -- As a general practice, I would recommend moving the `.toLowerCase(Locale.ROOT)` into the setter. Then we don't need to invoke the `.toLowerCase(Locale.ROOT)` multiple times in the code. (here it happens to be once). And we can always assume the $(family) has predictable values in the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17910#discussion_r115416204 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -890,7 +890,7 @@ object LogisticRegression extends DefaultParamsReadable[LogisticRegression] { override def load(path: String): LogisticRegression = super.load(path) private[classification] val supportedFamilyNames = -Array("auto", "binomial", "multinomial").map(_.toLowerCase(Locale.ROOT)) +Array("auto", "binomial", "multinomial") --- End diff -- We may need to be careful to remove the map. Since Locale.Root can be some special case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17876: [SPARK-20569][SQL] RuntimeReplaceable functions should n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17876 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76616/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17876: [SPARK-20569][SQL] RuntimeReplaceable functions should n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17876 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17876: [SPARK-20569][SQL] RuntimeReplaceable functions should n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17876 **[Test build #76616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76616/testReport)** for PR 17876 at commit [`601e988`](https://github.com/apache/spark/commit/601e98813f59b98e6a0f10aeea5bfc0e1e6571a1). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17913: [SPARK-20672][SS] Keep the `isStreaming` property in tri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17913 **[Test build #76643 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76643/testReport)** for PR 17913 at commit [`8cee88e`](https://github.com/apache/spark/commit/8cee88e36092ee568c61a68c5a9ce97cda58839c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17915: [SPARK-20674][SQL] Support registering UserDefinedFuncti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17915 **[Test build #76642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76642/testReport)** for PR 17915 at commit [`55421ea`](https://github.com/apache/spark/commit/55421ea99a97c6820169a22b1a5bfc00318ac66b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17894: [SPARK-17134][ML] Use level 2 BLAS operations in Logisti...
Github user VinceShieh commented on the issue: https://github.com/apache/spark/pull/17894 @hhbyyh performance testing is ongoing, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17894: [SPARK-17134][ML] Use level 2 BLAS operations in ...
Github user VinceShieh commented on a diff in the pull request: https://github.com/apache/spark/pull/17894#discussion_r115415823 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1722,25 +1723,22 @@ private class LogisticAggregator( var maxMargin = Double.NegativeInfinity val margins = new Array[Double](numClasses) +val featureStdArray = new Array[Double](features.size) --- End diff -- Agree. Still, we will try benchmark on the sparse dataset, if such change hurt the performance for sparse data, we will bypass this change for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17896: [SPARK-20373][SQL][SS] Batch queries with 'Datase...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/17896#discussion_r115415803 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2457,6 +2457,19 @@ object CleanupAliases extends Rule[LogicalPlan] { } /** + * Ignore event time watermark in batch query, which is only supported in Structured Streaming. + * TODO: add this rule into analyzer rule list. + */ +object CheckEventTimeWatermark extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case EventTimeWatermark(_, _, child) if !child.isStreaming => + logWarning("EventTime watermark is only supported in Structured Streaming but found " + --- End diff -- got --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17865#discussion_r115415748 --- Diff: python/pyspark/sql/functions.py --- @@ -1120,12 +1159,12 @@ def from_utc_timestamp(timestamp, tz): @since(1.5) def to_utc_timestamp(timestamp, tz): """ -Given a timestamp, which corresponds to a certain time of day in the given timezone, returns -another timestamp that corresponds to the same time of day in UTC. +Given a `timestamp`, which corresponds to a time of day in the timezone `tz`, --- End diff -- No, I don't think we have a rule about this up to my knowledge. Thank you for the pointers and looking into this. Let's follow the majority then for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17865#discussion_r115415714 --- Diff: python/pyspark/sql/functions.py --- @@ -1120,12 +1159,12 @@ def from_utc_timestamp(timestamp, tz): @since(1.5) def to_utc_timestamp(timestamp, tz): """ -Given a timestamp, which corresponds to a certain time of day in the given timezone, returns -another timestamp that corresponds to the same time of day in UTC. +Given a `timestamp`, which corresponds to a time of day in the timezone `tz`, --- End diff -- No, I think we have a rule about this up to my knowledge. Thank you for the pointers and looking into this. Let's follow the majority then for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17896: [SPARK-20373][SQL][SS] Batch queries with 'Datase...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/17896#discussion_r115415668 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2457,6 +2457,19 @@ object CleanupAliases extends Rule[LogicalPlan] { } /** + * Ignore event time watermark in batch query, which is only supported in Structured Streaming. + * TODO: add this rule into analyzer rule list. + */ +object CheckEventTimeWatermark extends Rule[LogicalPlan] { --- End diff -- @zsxwing This pr does some prepare work before we add `EliminateEventTimeWatermark ` into `Analyzer.batches`. Could you please take a review? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17915: [SPARK-20674][SQL] Support registering UserDefine...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/17915 [SPARK-20674][SQL] Support registering UserDefinedFunction as named UDF ## What changes were proposed in this pull request? For some reason we don't have an API to register UserDefinedFunction as named UDF. It is a no brainer to add one, in addition to the existing register functions we have. ## How was this patch tested? Added a test case in UDFSuite for the new API. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-20674 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17915.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17915 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17894: [SPARK-17134][ML] Use level 2 BLAS operations in ...
Github user VinceShieh commented on a diff in the pull request: https://github.com/apache/spark/pull/17894#discussion_r115415580 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -23,6 +23,7 @@ import scala.collection.mutable import breeze.linalg.{DenseVector => BDV} import breeze.optimize.{CachedDiffFunction, DiffFunction, LBFGS => BreezeLBFGS, LBFGSB => BreezeLBFGSB, OWLQN => BreezeOWLQN} +import com.github.fommil.netlib.BLAS.{getInstance => blas} --- End diff -- MLLib BLAS doesnt have ger support, we might, of course, add an API support in MLLib Blas for this issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...
Github user zuotingbing commented on a diff in the pull request: https://github.com/apache/spark/pull/17858#discussion_r115415586 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -97,12 +97,23 @@ case class InsertIntoHiveTable( val inputPathUri: URI = inputPath.toUri val inputPathName: String = inputPathUri.getPath val fs: FileSystem = inputPath.getFileSystem(hadoopConf) -val stagingPathName: String = +var stagingPathName: String = if (inputPathName.indexOf(stagingDir) == -1) { new Path(inputPathName, stagingDir).toString } else { inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length) } + +// SPARK-20594: The staging directory should be a child directory starts with "." to avoid +// being deleted if we set hive.exec.stagingdir under the table directory. +if (FileUtils.isSubDir(new Path(stagingPathName), inputPath, fs) + && !stagingPathName.stripPrefix(inputPathName).startsWith(".")) { --- End diff -- Sorry i do not follow your logic. Correct me if I'm wrong, but isn't the logic of dropping the created staging directory was already there before with `fs.deleteOnExit(dir)`? As @cloud-fan said this patch seems a valid workaround in Spark SQL for this case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17914: [SPARK-20673][ML] LDA `optimizer` do not really support ...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/17914 @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17913: [SPARK-20672][SS] Keep the `isStreaming` property...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/17913#discussion_r115415483 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala --- @@ -64,8 +64,20 @@ case class StreamingRelationExec(sourceName: String, output: Seq[Attribute]) ext } } -object StreamingExecutionRelation { - def apply(source: Source): StreamingExecutionRelation = { -StreamingExecutionRelation(source, source.schema.toAttributes) +case class StreamingRelationWrapper(child: LogicalPlan) extends UnaryNode { + override def isStreaming: Boolean = true + override def output: Seq[Attribute] = child.output +} + --- End diff -- Add a new `StreamingRelationWrapper` relation to wrap the internal relation in each trigger. It keeps the `isStreaming` property. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17896: [SPARK-20373][SQL][SS] Batch queries with 'Dataset/DataF...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17896 Depends upon: [SPARK-20672](https://issues.apache.org/jira/browse/SPARK-20672) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17865#discussion_r115415128 --- Diff: python/pyspark/sql/functions.py --- @@ -153,7 +173,7 @@ def _(): # math functions that take two arguments as input _binary_mathfunctions = { 'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' + - 'polar coordinates (r, theta).', + 'polar coordinates (r, theta). Units in radians.', --- End diff -- I see. What do you think about adding this in `:param`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17913: [SPARK-20672][SS] Keep the `isStreaming` property in tri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17913 **[Test build #76640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76640/testReport)** for PR 17913 at commit [`20648d9`](https://github.com/apache/spark/commit/20648d99b1b95ea074be56708f13901bba2ee10d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17644 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17644 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76613/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17901: [SPARK-20639][SQL] Add single argument support for to_ti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17901 **[Test build #76641 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76641/testReport)** for PR 17901 at commit [`fc02460`](https://github.com/apache/spark/commit/fc02460c5d014c573631f3b62cd6b62f5a46c261). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17914: [SPARK-20673][ML] LDA `optimizer` do not really support ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17914 **[Test build #76639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76639/testReport)** for PR 17914 at commit [`b48f760`](https://github.com/apache/spark/commit/b48f7601408a005e773216bc67935c73f7f59324). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17644 **[Test build #76613 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76613/testReport)** for PR 17644 at commit [`49040e8`](https://github.com/apache/spark/commit/49040e83217a787f7a995f9da941617885e10821). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17914: [SPARK-20673][ML] LDA `optimizer` do not really s...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/17914 [SPARK-20673][ML] LDA `optimizer` do not really support case insensitive ## What changes were proposed in this pull request? cast to loweer case in `getOptimizer` ## How was this patch tested? updated tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark lda_optimizer_case Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17914.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17914 commit b48f7601408a005e773216bc67935c73f7f59324 Author: Zheng RuiFeng Date: 2017-05-09T06:17:51Z create pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17666: [SPARK-20311][SQL] Support aliases for table value funct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17666 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17666: [SPARK-20311][SQL] Support aliases for table value funct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17666 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76615/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17666: [SPARK-20311][SQL] Support aliases for table value funct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17666 **[Test build #76615 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76615/testReport)** for PR 17666 at commit [`81bef3b`](https://github.com/apache/spark/commit/81bef3ba21cb0c3e4b36f3fc492d9ab3a3124829). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17913: [SPARK-20672][SS] Keep the `isStreaming` property...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/17913 [SPARK-20672][SS] Keep the `isStreaming` property in triggerLogicalPlan in Structured Streaming ## What changes were proposed in this pull request? In Structured Streaming, the "isStreaming" property will be eliminated in each triggerLogicalPlan. Then, some rules will be applied to this triggerLogicalPlan mistakely. So, we should refactor existing code to better execute batch query and ss query. ## How was this patch tested? existing ut. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-20672 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17913.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17913 commit d1c4cbf0fa369db993855ef3f63b05561cf6662a Author: uncleGen Date: 2017-05-09T06:01:51Z Keep the `streaming` property in triggerLogicalPlan in Structured Streaming commit 20648d99b1b95ea074be56708f13901bba2ee10d Author: uncleGen Date: 2017-05-09T06:18:50Z update --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17901: [SPARK-20639][SQL] Add single argument support fo...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17901#discussion_r115414505 --- Diff: R/pkg/R/functions.R --- @@ -1757,7 +1757,8 @@ setMethod("toRadians", #' \url{http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html}. #' If the string cannot be parsed according to the specified format (or default), #' the value of the column will be null. -#' The default format is '-MM-dd'. +#' By default, it follows casting rules to a DateType if the format is omitted +#' (equivalent with \code{cast(df$x, "date")}). --- End diff -- @felixcheung, I added an example here. Would this be enough? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17879: [SPARK-20619][ML] StringIndexer supports multiple...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/17879#discussion_r115414436 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -59,6 +59,29 @@ private[feature] trait StringIndexerBase extends Params with HasInputCol with Ha @Since("1.6.0") def getHandleInvalid: String = $(handleInvalid) + /** + * Param for how to order labels of string column. The first label after ordering is assigned + * an index of 0. + * Options are: + * - 'frequencyDesc': descending order by label frequency (most frequent label assigned 0) + * - 'frequencyAsc': ascending order by label frequency (least frequent label assigned 0) + * - 'alphabetDesc': descending alphabetical order + * - 'alphabetAsc': ascending alphabetical order + * Default is 'frequencyDesc'. + * + * @group param + */ + @Since("2.3.0") + final val stringOrderType: Param[String] = new Param(this, "stringOrderType", +"how to order labels of string column. " + +"The first label after ordering is assigned an index of 0. " + +s"Supported options: ${StringIndexer.supportedStringOrderType.mkString(", ")}.", +ParamValidators.inArray(StringIndexer.supportedStringOrderType)) --- End diff -- @felixcheung Right. It does not quite make sense to be case-sensitive now given that we now use camel case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17901: [SPARK-20639][SQL] Add single argument support for to_ti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17901 **[Test build #76638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76638/testReport)** for PR 17901 at commit [`b6f867c`](https://github.com/apache/spark/commit/b6f867cd87e46ca2daf74eabce14b735a962c9a4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17879: [SPARK-20619][ML] StringIndexer supports multiple...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17879#discussion_r115414165 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -59,6 +59,29 @@ private[feature] trait StringIndexerBase extends Params with HasInputCol with Ha @Since("1.6.0") def getHandleInvalid: String = $(handleInvalid) + /** + * Param for how to order labels of string column. The first label after ordering is assigned + * an index of 0. + * Options are: + * - 'frequencyDesc': descending order by label frequency (most frequent label assigned 0) + * - 'frequencyAsc': ascending order by label frequency (least frequent label assigned 0) + * - 'alphabetDesc': descending alphabetical order + * - 'alphabetAsc': ascending alphabetical order + * Default is 'frequencyDesc'. + * + * @group param + */ + @Since("2.3.0") + final val stringOrderType: Param[String] = new Param(this, "stringOrderType", +"how to order labels of string column. " + +"The first label after ordering is assigned an index of 0. " + +s"Supported options: ${StringIndexer.supportedStringOrderType.mkString(", ")}.", +ParamValidators.inArray(StringIndexer.supportedStringOrderType)) --- End diff -- so we are going to case sensitive then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17879: [SPARK-20619][ML] StringIndexer supports multiple...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/17879#discussion_r115413770 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -131,6 +167,12 @@ object StringIndexer extends DefaultParamsReadable[StringIndexer] { private[feature] val KEEP_INVALID: String = "keep" private[feature] val supportedHandleInvalids: Array[String] = Array(SKIP_INVALID, ERROR_INVALID, KEEP_INVALID) + private[feature] val FREQ_DESC: String = "frequency_desc" + private[feature] val FREQ_ASC: String = "frequency_asc" + private[feature] val ALPHABET_DESC: String = "alphabet_desc" + private[feature] val ALPHABET_ASC: String = "alphabet_asc" --- End diff -- @gatorsmile Thanks much for the suggestion. Changed them to lowerCamelCase. @felixcheung Any additional suggestions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17901: [SPARK-20639][SQL] Add single argument support for to_ti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17901 **[Test build #76636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76636/testReport)** for PR 17901 at commit [`497a229`](https://github.com/apache/spark/commit/497a22965af3a74e89c73b60667ab19fecb0af39). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #76637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76637/testReport)** for PR 17862 at commit [`8a7c10f`](https://github.com/apache/spark/commit/8a7c10f5bc0d7234ed6e156c98f04bddb7a37204). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17879: [SPARK-20619][ML] StringIndexer supports multiple ways t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17879 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17879: [SPARK-20619][ML] StringIndexer supports multiple ways t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17879 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76624/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17879: [SPARK-20619][ML] StringIndexer supports multiple ways t...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17879 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17879: [SPARK-20619][ML] StringIndexer supports multiple ways t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17879 **[Test build #76624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76624/testReport)** for PR 17879 at commit [`07198d9`](https://github.com/apache/spark/commit/07198d9bb45a54d3c257ad37e772cc31154ffcb6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115409386 --- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala --- @@ -54,7 +54,8 @@ private[spark] abstract class MemoryManager( onHeapStorageMemoryPool.incrementPoolSize(onHeapStorageMemory) onHeapExecutionMemoryPool.incrementPoolSize(onHeapExecutionMemory) - protected[this] val maxOffHeapMemory = conf.getSizeAsBytes("spark.memory.offHeap.size", 0) + protected[this] val maxOffHeapMemory = +conf.getSizeAsBytes("spark.memory.offHeap.size", 384 * 1024 * 1024) --- End diff -- Maybe I missed the discussion, why is this changed ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115411681 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -154,15 +164,24 @@ final class ShuffleBlockFetcherIterator( while (iter.hasNext) { val result = iter.next() result match { -case SuccessFetchResult(_, address, _, buf, _) => +case SuccessFetchResult(_, address, size, buf, _) => if (address != blockManager.blockManagerId) { shuffleMetrics.incRemoteBytesRead(buf.size) shuffleMetrics.incRemoteBlocksFetched(1) } buf.release() + freeMemory(size) case _ => } } +shuffleFiles.foreach { shuffleFile => + try { +shuffleFile.delete() + } catch { +case ioe: IOException => + logError(s"Failed to cleanup ${shuffleFile.getAbsolutePath}.", ioe) --- End diff -- `IOException` is not thrown by delete - but it can return `false` to indicate delete failure. The log message (INFO would do btw) should be on `delete()` returning `false`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115410895 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +206,18 @@ private[spark] object HighlyCompressedMapStatus { } else { 0 } +val hugeBlockSizes = ArrayBuffer[Tuple2[Int, Byte]]() +if (numNonEmptyBlocks > 0) { + uncompressedSizes.zipWithIndex.foreach { +case (size, reduceId) => + if (size > 2 * avgSize) { --- End diff -- This should be configurable in two respects. * minimum size before we consider something a large block : if average is 10kb, and some blocks are > 20kb, spilling them to disk would be highly suboptimal. (Unless I missed that check somewhere else). * The fraction '2' should also be configurable - some deployments might be ok with high memory usage (machines provisioned accordingly) while others might need it to be more aggressive and lower. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115409772 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus( * @param numNonEmptyBlocks the number of non-empty blocks * @param emptyBlocks a bitmap tracking which blocks are empty * @param avgSize average size of the non-empty blocks + * @param hugeBlockSizesArray sizes of huge blocks by their reduceId. */ private[spark] class HighlyCompressedMapStatus private ( private[this] var loc: BlockManagerId, private[this] var numNonEmptyBlocks: Int, private[this] var emptyBlocks: RoaringBitmap, -private[this] var avgSize: Long) +private[this] var avgSize: Long, +private[this] var hugeBlockSizesArray: Array[Tuple2[Int, Byte]]) extends MapStatus with Externalizable { + @transient var hugeBlockSizes: Map[Int, Byte] = --- End diff -- `private` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115410407 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus( * @param numNonEmptyBlocks the number of non-empty blocks * @param emptyBlocks a bitmap tracking which blocks are empty * @param avgSize average size of the non-empty blocks + * @param hugeBlockSizesArray sizes of huge blocks by their reduceId. */ private[spark] class HighlyCompressedMapStatus private ( private[this] var loc: BlockManagerId, private[this] var numNonEmptyBlocks: Int, private[this] var emptyBlocks: RoaringBitmap, -private[this] var avgSize: Long) +private[this] var avgSize: Long, +private[this] var hugeBlockSizesArray: Array[Tuple2[Int, Byte]]) extends MapStatus with Externalizable { + @transient var hugeBlockSizes: Map[Int, Byte] = +if (hugeBlockSizesArray == null) null else hugeBlockSizesArray.toMap + // loc could be null when the default constructor is called during deserialization require(loc == null || avgSize > 0 || numNonEmptyBlocks == 0, "Average size can only be zero for map stages that produced no output") - protected def this() = this(null, -1, null, -1) // For deserialization only + protected def this() = this(null, -1, null, -1, null) // For deserialization only override def location: BlockManagerId = loc override def getSizeForBlock(reduceId: Int): Long = { if (emptyBlocks.contains(reduceId)) { 0 } else { - avgSize + hugeBlockSizes.get(reduceId) match { --- End diff -- NPE --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115410381 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus( * @param numNonEmptyBlocks the number of non-empty blocks * @param emptyBlocks a bitmap tracking which blocks are empty * @param avgSize average size of the non-empty blocks + * @param hugeBlockSizesArray sizes of huge blocks by their reduceId. */ private[spark] class HighlyCompressedMapStatus private ( private[this] var loc: BlockManagerId, private[this] var numNonEmptyBlocks: Int, private[this] var emptyBlocks: RoaringBitmap, -private[this] var avgSize: Long) +private[this] var avgSize: Long, +private[this] var hugeBlockSizesArray: Array[Tuple2[Int, Byte]]) extends MapStatus with Externalizable { + @transient var hugeBlockSizes: Map[Int, Byte] = +if (hugeBlockSizesArray == null) null else hugeBlockSizesArray.toMap + // loc could be null when the default constructor is called during deserialization require(loc == null || avgSize > 0 || numNonEmptyBlocks == 0, "Average size can only be zero for map stages that produced no output") - protected def this() = this(null, -1, null, -1) // For deserialization only + protected def this() = this(null, -1, null, -1, null) // For deserialization only override def location: BlockManagerId = loc override def getSizeForBlock(reduceId: Int): Long = { if (emptyBlocks.contains(reduceId)) { 0 } else { - avgSize + hugeBlockSizes.get(reduceId) match { +case Some(size) => MapStatus.decompressSize(size) +case None => avgSize + } } } override def writeExternal(out: ObjectOutput): Unit = Utils.tryOrIOException { loc.writeExternal(out) emptyBlocks.writeExternal(out) out.writeLong(avgSize) +out.writeObject(hugeBlockSizesArray) } override def readExternal(in: ObjectInput): Unit = Utils.tryOrIOException { loc = BlockManagerId(in) emptyBlocks = new RoaringBitmap() emptyBlocks.readExternal(in) avgSize = in.readLong() +hugeBlockSizesArray = in.readObject().asInstanceOf[Array[Tuple2[Int, Byte]]] --- End diff -- This can be null, and so need to be handled appropriately below. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115412242 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -137,6 +146,7 @@ final class ShuffleBlockFetcherIterator( // Release the current buffer if necessary if (currentResult != null) { currentResult.buf.release() + freeMemory(currentResult.size) --- End diff -- Only if in memory and not on disk ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115412049 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -154,15 +164,24 @@ final class ShuffleBlockFetcherIterator( while (iter.hasNext) { val result = iter.next() result match { -case SuccessFetchResult(_, address, _, buf, _) => +case SuccessFetchResult(_, address, size, buf, _) => if (address != blockManager.blockManagerId) { shuffleMetrics.incRemoteBytesRead(buf.size) shuffleMetrics.incRemoteBlocksFetched(1) } buf.release() + freeMemory(size) --- End diff -- Only if it was *not* fetched to disk - if it was spilled to disk, we did not acquire memory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115409268 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -126,4 +151,39 @@ private void failRemainingBlocks(String[] failedBlockIds, Throwable e) { } } } + + private class DownloadCallback implements StreamCallback { + +private WritableByteChannel channel = null; +private File targetFile = null; +private int chunkIndex; + +public DownloadCallback(File targetFile, int chunkIndex) throws IOException { + this.targetFile = targetFile; + this.channel = Channels.newChannel(new FileOutputStream(targetFile)); + this.chunkIndex = chunkIndex; +} + +@Override +public void onData(String streamId, ByteBuffer buf) throws IOException { + channel.write(buf); +} + +@Override +public void onComplete(String streamId) throws IOException { + channel.close(); + ManagedBuffer buffer = new FileSegmentManagedBuffer( --- End diff -- After consumption of each corresponding ManagedBuffer, we should make an attempt to remove the corresponding file : should be fairly straightforward, no ? (override release ?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115407998 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java --- @@ -95,6 +95,14 @@ public ManagedBuffer getChunk(long streamId, int chunkIndex) { } @Override + public ManagedBuffer openStream(String streamChunkId) { +String[] array = streamChunkId.split("_"); --- End diff -- Instead of spread the parsing logic, it is better to externalize this into a pair of methods - one to create streamChunkId given streamId and chunkIndex and another to retrieve it. If we have to change delimiter or add other logic, it will be more easier to manage the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115409001 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -126,4 +149,38 @@ private void failRemainingBlocks(String[] failedBlockIds, Throwable e) { } } } + + private class DownloadCallback implements StreamCallback { + +private WritableByteChannel channel = null; +private File targetFile = null; +private int chunkIndex; + +public DownloadCallback(File targetFile, int chunkIndex) throws IOException { + this.targetFile = targetFile; + this.channel = Channels.newChannel(new FileOutputStream(targetFile)); + this.chunkIndex = chunkIndex; +} + +@Override +public void onData(String streamId, ByteBuffer buf) throws IOException { + channel.write(buf); --- End diff -- As an impl detail (since channel is a FOS), this will work - but in general, channel.write() need not write buf.remain(); which actually breaks spark code iirc - since it expects odData to completely consume the data. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115409843 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus( * @param numNonEmptyBlocks the number of non-empty blocks * @param emptyBlocks a bitmap tracking which blocks are empty * @param avgSize average size of the non-empty blocks + * @param hugeBlockSizesArray sizes of huge blocks by their reduceId. */ private[spark] class HighlyCompressedMapStatus private ( private[this] var loc: BlockManagerId, private[this] var numNonEmptyBlocks: Int, private[this] var emptyBlocks: RoaringBitmap, -private[this] var avgSize: Long) +private[this] var avgSize: Long, +private[this] var hugeBlockSizesArray: Array[Tuple2[Int, Byte]]) --- End diff -- Why does hugeBlockSizesArray exist ? Is it for efficient serializable ? If yes, the then converting it into (Array[Int], Array[Byte]) would be better (with each array written directly - not as Tupe2 object) - more below though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115410136 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus( * @param numNonEmptyBlocks the number of non-empty blocks * @param emptyBlocks a bitmap tracking which blocks are empty * @param avgSize average size of the non-empty blocks + * @param hugeBlockSizesArray sizes of huge blocks by their reduceId. */ private[spark] class HighlyCompressedMapStatus private ( private[this] var loc: BlockManagerId, private[this] var numNonEmptyBlocks: Int, private[this] var emptyBlocks: RoaringBitmap, -private[this] var avgSize: Long) +private[this] var avgSize: Long, +private[this] var hugeBlockSizesArray: Array[Tuple2[Int, Byte]]) extends MapStatus with Externalizable { + @transient var hugeBlockSizes: Map[Int, Byte] = +if (hugeBlockSizesArray == null) null else hugeBlockSizesArray.toMap + // loc could be null when the default constructor is called during deserialization require(loc == null || avgSize > 0 || numNonEmptyBlocks == 0, "Average size can only be zero for map stages that produced no output") - protected def this() = this(null, -1, null, -1) // For deserialization only + def this() = this(null, -1, null, -1, null) // For deserialization only override def location: BlockManagerId = loc override def getSizeForBlock(reduceId: Int): Long = { if (emptyBlocks.contains(reduceId)) { 0 } else { - avgSize + hugeBlockSizes.get(reduceId) match { +case Some(size) => MapStatus.decompressSize(size) +case None => avgSize + } } } override def writeExternal(out: ObjectOutput): Unit = Utils.tryOrIOException { loc.writeExternal(out) emptyBlocks.writeExternal(out) out.writeLong(avgSize) +out.writeObject(hugeBlockSizesArray) } override def readExternal(in: ObjectInput): Unit = Utils.tryOrIOException { loc = BlockManagerId(in) emptyBlocks = new RoaringBitmap() emptyBlocks.readExternal(in) avgSize = in.readLong() +hugeBlockSizesArray = in.readObject().asInstanceOf[Array[Tuple2[Int, Byte]]] +hugeBlockSizes = hugeBlockSizesArray.toMap --- End diff -- Object creation (this()) has already happened - readExternal is restoring the state from the stream. So we need to keep this @cloud-fan One issue I have here is that we are duplicating the information between hugeBlockSizesArray and hugeBlockSizes. I would prefer if we dropped hugeBlockSizesArray entirely (other than as constructor param we initialize state from). This will actually result in more efficient serde at the cost of manually doing the serde for hugeBlockSizes, and handle all the corner cases (like avoid need for any null check, etc). For serialization: write length, loop - write key as int, write value as byte; for deserialization, the reverse. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17912: [SPARK-20670] [ML] Simplify FPGrowth transform
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17912 **[Test build #76635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76635/testReport)** for PR 17912 at commit [`b9e3e47`](https://github.com/apache/spark/commit/b9e3e47706af2b9b09fa73101487d31a00779dc3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17879: [SPARK-20619][ML] StringIndexer supports multiple ways t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17879 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76621/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17879: [SPARK-20619][ML] StringIndexer supports multiple ways t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17879 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17879: [SPARK-20619][ML] StringIndexer supports multiple ways t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17879 **[Test build #76621 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76621/testReport)** for PR 17879 at commit [`ff9b1d6`](https://github.com/apache/spark/commit/ff9b1d66873eb8cad1a4a13f323555da2706a849). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17911: [SPARK-20668][SQL] Modify ScalaUDF to handle nullability...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/17911 cc @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17912: [SPARK-20670] [ML] Simplify FPGrowth transform
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17912 cc @srowen @jkbradley @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17858 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76617/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17858 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17858 **[Test build #76617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76617/testReport)** for PR 17858 at commit [`6b22d3e`](https://github.com/apache/spark/commit/6b22d3ea694c4133965ddface73c52c3566cd156). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17912: [SPARK-20670] [ML] Simplify FPGrowth transform
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/17912 [SPARK-20670] [ML] Simplify FPGrowth transform ## What changes were proposed in this pull request? As suggested by Sean Owen in https://github.com/apache/spark/pull/17130, the transform code in FPGrowthModel can be simplified. As I tested on some public dataset http://fimi.ua.ac.be/data/, the performance of the new transform code is even or better than the old implementation. ## How was this patch tested? Existing unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hhbyyh/spark fpgrowthTransform Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17912.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17912 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added if joi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16985 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added if joi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16985 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76614/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17905: [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames(...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17905 **[Test build #76634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76634/testReport)** for PR 17905 at commit [`b37a760`](https://github.com/apache/spark/commit/b37a760417ea5f9b958a7329dbccd110478821ff). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17905: [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tabl...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17905 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17910: [SPARK-20669][ML] LogisticRegression family should be ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17910 **[Test build #76633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76633/testReport)** for PR 17910 at commit [`33c0f9e`](https://github.com/apache/spark/commit/33c0f9e52c239a6067a535be9c0ce19772d32aef). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added if joi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16985 **[Test build #76614 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76614/testReport)** for PR 16985 at commit [`e202ac1`](https://github.com/apache/spark/commit/e202ac1eda5fd1be3e466eea8975a1b0af54129f). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17911: [SPARK-20668][SQL] Modify ScalaUDF to handle nullability...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17911 **[Test build #76632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76632/testReport)** for PR 17911 at commit [`120c862`](https://github.com/apache/spark/commit/120c862bada2e8a574f29ea4eb4434a528d59b3b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17911: [SPARK-20668][SQL] Modify ScalaUDF to handle null...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/17911 [SPARK-20668][SQL] Modify ScalaUDF to handle nullability. ## What changes were proposed in this pull request? When registering Scala UDF, we can know if the udf will return nullable value or not. `ScalaUDF` and related classes should handle the nullability. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-20668 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17911.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17911 commit 120c862bada2e8a574f29ea4eb4434a528d59b3b Author: Takuya UESHIN Date: 2017-05-05T04:17:18Z Modify ScalaUDF to handle nullability. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17905: [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames(...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17905 merged to master/2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/17910 [SPARK-20669][ML] LogisticRegression family should be case insensitive ## What changes were proposed in this pull request? make param `family` case insensitive ## How was this patch tested? updated tests @yanboliang You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark lr_family_lowercase Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17910.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17910 commit 33c0f9e52c239a6067a535be9c0ce19772d32aef Author: Zheng RuiFeng Date: 2017-05-09T05:43:13Z create pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17905: [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames(...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17905 ok Jenkins passes, I'm going to merge this in since there are a bunch of PR failing because of this, even when they say it's up-to-date with master. I'm going to investigate further though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #76631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76631/testReport)** for PR 15435 at commit [`449782a`](https://github.com/apache/spark/commit/449782a36ed139919bec6b114938590a383eaf43). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org