[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/5645#discussion_r29026826 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala --- @@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag]( logDebug(s"Read partition data of $this from block manager, block $blockId") iterator case None => // Data not found in Block Manager, grab it from write ahead log file -val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf) -val dataRead = reader.read(partition.segment) -reader.close() +var dataRead: ByteBuffer = null +var writeAheadLog: WriteAheadLog = null +try { + val dummyDirectory = FileUtils.getTempDirectoryPath() --- End diff -- Why here need to use `dummyDirectory`? Assuming WAL may not be file-based, so I'm not sure what's the meaning we need to have this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5676#issuecomment-95822131 This looks like a duplicate of SPARK-6954 (PR #5536) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-95821459 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30916/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-95821427 [Test build #30916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30916/consoleFull) for PR 2342 at commit [`d3c63c8`](https://github.com/apache/spark/commit/d3c63c84a56041756841dd0706d87c8c808e84d3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class ExecutorUIData(` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6122][Core] Upgrade tachyon-client vers...
Github user aniketbhatnagar commented on the pull request: https://github.com/apache/spark/pull/5354#issuecomment-95819955 +1 from my side. having a consistent httpclient version would be so much better! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5680#issuecomment-95819308 [Test build #30920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30920/consoleFull) for PR 5680 at commit [`28d668f`](https://github.com/apache/spark/commit/28d668faf51495e779aa1f874ceb03a64bccf410). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/5680 [SPARK-7112][Streaming] Add a DirectStreamTracker to track the direct streams You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-7111 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5680.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5680 commit 28d668faf51495e779aa1f874ceb03a64bccf410 Author: jerryshao Date: 2015-04-24T06:07:54Z Add DirectStreamTracker to track the direct streams --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7097][SQL]: Partitioned tables should o...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5668#issuecomment-95817298 [Test build #30919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30919/consoleFull) for PR 5668 at commit [`b4651fd`](https://github.com/apache/spark/commit/b4651fd80a55f016093d84cf3b00ad6c91333cef). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-95813823 [Test build #30918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30918/consoleFull) for PR 2342 at commit [`b09d0c5`](https://github.com/apache/spark/commit/b09d0c5f76aa1eb2912ef625c4bd0ffa2c729d64). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5645#issuecomment-95811201 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30914/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5645#issuecomment-95811180 [Test build #30914 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30914/consoleFull) for PR 5645 at commit [`d7cd15b`](https://github.com/apache/spark/commit/d7cd15b5cef64766a432918e54cca4750d13745b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7103: Fix crash with SparkContext.union ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5679#issuecomment-95810737 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7103: Fix crash with SparkContext.union ...
GitHub user stevencanopy opened a pull request: https://github.com/apache/spark/pull/5679 SPARK-7103: Fix crash with SparkContext.union when RDD has no partitioner Added a check to the SparkContext.union method to check that a partitioner is defined on all RDDs when instantiating a PartitionerAwareUnionRDD. You can merge this pull request into a Git repository by running: $ git pull https://github.com/stevencanopy/spark SPARK-7103 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5679.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5679 commit 5a3d84649b46df9fd670e951941e809e1e6d98a7 Author: Steven She Date: 2015-04-24T05:55:25Z SPARK-7103: Fix crash with SparkContext.union when at least one RDD has no partitioner --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5894][ML] Add polynomial mapper
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/5245#issuecomment-95810160 @mengxr I do some tests on these two versions, here is the result log: (You can see my code [here](https://github.com/yinxusen/spark/blob/PerformanceTest-5894/mllib/src/main/scala/org/apache/spark/ml/feature/PolynomialMapper.scala).) ```bash sbt "mllib/run-main org.apache.spark.ml.feature.PolynomialMapper" 2>&1>test.log ``` > [info] Testing number of data 1024 [info] Testing dataset degree: 2 mapper: PolynomialMapper-V1 name: denseData [info] Elapsed time: 48.591317ms [info] Testing dataset degree: 2 mapper: PolynomialMapper-V1 name: sparseData [info] Elapsed time: 43.113877ms [info] Testing dataset degree: 2 mapper: PolynomialMapper-V2 name: denseData [info] Elapsed time: 38.518744ms [info] Testing dataset degree: 2 mapper: PolynomialMapper-V2 name: sparseData [info] Elapsed time: 36.946037ms [info] Testing dataset degree: 3 mapper: PolynomialMapper-V1 name: denseData [info] Elapsed time: 34.615637ms [info] Testing dataset degree: 3 mapper: PolynomialMapper-V1 name: sparseData [info] Elapsed time: 39.327571ms [info] Testing dataset degree: 3 mapper: PolynomialMapper-V2 name: denseData [info] Elapsed time: 35.640954ms [info] Testing dataset degree: 3 mapper: PolynomialMapper-V2 name: sparseData [info] Elapsed time: 38.740797ms [info] Testing dataset degree: 5 mapper: PolynomialMapper-V1 name: denseData [info] Elapsed time: 37.757011ms [info] Testing dataset degree: 5 mapper: PolynomialMapper-V1 name: sparseData [info] Elapsed time: 39.291329ms [info] Testing dataset degree: 5 mapper: PolynomialMapper-V2 name: denseData [info] Elapsed time: 34.665687ms [info] Testing dataset degree: 5 mapper: PolynomialMapper-V2 name: sparseData [info] Elapsed time: 37.758357ms [info] Testing dataset degree: 10 mapper: PolynomialMapper-V1 name: denseData [info] Elapsed time: 33.307436ms [info] Testing dataset degree: 10 mapper: PolynomialMapper-V1 name: sparseData [info] Elapsed time: 37.231837ms [info] Testing dataset degree: 10 mapper: PolynomialMapper-V2 name: denseData [info] Elapsed time: 34.794309ms [info] Testing dataset degree: 10 mapper: PolynomialMapper-V2 name: sparseData [info] Elapsed time: 37.112773ms > [info] Testing number of data 10240 [info] Testing dataset degree: 2 mapper: PolynomialMapper-V1 name: denseData [info] Elapsed time: 76.447725ms [info] Testing dataset degree: 2 mapper: PolynomialMapper-V1 name: sparseData [info] Elapsed time: 98.351862ms [info] Testing dataset degree: 2 mapper: PolynomialMapper-V2 name: denseData [info] Elapsed time: 76.17611ms [info] Testing dataset degree: 2 mapper: PolynomialMapper-V2 name: sparseData [info] Elapsed time: 99.099883ms [info] Testing dataset degree: 3 mapper: PolynomialMapper-V1 name: denseData [info] Elapsed time: 76.661511ms [info] Testing dataset degree: 3 mapper: PolynomialMapper-V1 name: sparseData [info] Elapsed time: 99.442798ms [info] Testing dataset degree: 3 mapper: PolynomialMapper-V2 name: denseData [info] Elapsed time: 76.607076ms [info] Testing dataset degree: 3 mapper: PolynomialMapper-V2 name: sparseData [info] Elapsed time: 99.722276ms [info] Testing dataset degree: 5 mapper: PolynomialMapper-V1 name: denseData [info] Elapsed time: 76.337466ms [info] Testing dataset degree: 5 mapper: PolynomialMapper-V1 name: sparseData [info] Elapsed time: 99.550001ms [info] Testing dataset degree: 5 mapper: PolynomialMapper-V2 name: denseData [info] Elapsed time: 76.633637ms [info] Testing dataset degree: 5 mapper: PolynomialMapper-V2 name: sparseData [info] Elapsed time: 98.995122ms [info] Testing dataset degree: 10 mapper: PolynomialMapper-V1 name: denseData [info] Elapsed time: 77.281723ms [info] Testing dataset degree: 10 mapper: PolynomialMapper-V1 name: sparseData [info] Elapsed time: 100.623104ms [in
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5604#issuecomment-95809888 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30917/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5604#issuecomment-95809882 [Test build #30917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30917/consoleFull) for PR 5604 at commit [`5b96e2a`](https://github.com/apache/spark/commit/5b96e2aa3e6da2a836171e4783c8199d21daed20). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class WindowExpression(child: Expression, windowSpec: WindowSpec) extends UnaryExpression ` * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: Option[WindowFrame])` * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: Seq[SortOrder])` * `case class WindowFrame(frameType: FrameType, preceding: Int, following: Int)` * `case class WindowAggregate(` * `case class WindowAggregate(` * ` case class ComputedWindow(` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5604#issuecomment-95809470 [Test build #30917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30917/consoleFull) for PR 5604 at commit [`5b96e2a`](https://github.com/apache/spark/commit/5b96e2aa3e6da2a836171e4783c8199d21daed20). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/5626#issuecomment-95806853 LGTM except some minor inline comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5626#discussion_r29024766 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GBTExample.scala --- @@ -0,0 +1,238 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml + +import scala.collection.mutable +import scala.language.reflectiveCalls + +import scopt.OptionParser + +import org.apache.spark.{SparkConf, SparkContext} +import org.apache.spark.examples.mllib.AbstractParams +import org.apache.spark.ml.{Pipeline, PipelineStage} +import org.apache.spark.ml.classification.{GBTClassificationModel, GBTClassifier} +import org.apache.spark.ml.feature.{StringIndexer, VectorIndexer} +import org.apache.spark.ml.regression.{GBTRegressionModel, GBTRegressor} +import org.apache.spark.sql.DataFrame + + +/** + * An example runner for decision trees. Run with + * {{{ + * ./bin/run-example ml.GBTExample [options] + * }}} + * Decision Trees and ensembles can take a large amount of memory. If the run-example command + * above fails, try running via spark-submit and specifying the amount of memory as at least 1g. + * For local mode, run + * {{{ + * ./bin/spark-submit --class org.apache.spark.examples.ml.GBTExample --driver-memory 1g + * [examples JAR path] [options] + * }}} + * If you use it as a template to create your own app, please use `spark-submit` to submit your app. + */ +object GBTExample { + + case class Params( + input: String = null, + testInput: String = "", + dataFormat: String = "libsvm", + algo: String = "classification", + maxDepth: Int = 5, + maxBins: Int = 32, + minInstancesPerNode: Int = 1, + minInfoGain: Double = 0.0, + maxIter: Int = 10, + fracTest: Double = 0.2, + cacheNodeIds: Boolean = false, + checkpointDir: Option[String] = None, + checkpointInterval: Int = 10) extends AbstractParams[Params] + + def main(args: Array[String]) { +val defaultParams = Params() + +val parser = new OptionParser[Params]("GBTExample") { + head("GBTExample: an example Gradient-Boosted Trees app.") + opt[String]("algo") +.text(s"algorithm (classification, regression), default: ${defaultParams.algo}") +.action((x, c) => c.copy(algo = x)) + opt[Int]("maxDepth") +.text(s"max depth of the tree, default: ${defaultParams.maxDepth}") +.action((x, c) => c.copy(maxDepth = x)) + opt[Int]("maxBins") +.text(s"max number of bins, default: ${defaultParams.maxBins}") +.action((x, c) => c.copy(maxBins = x)) + opt[Int]("minInstancesPerNode") +.text(s"min number of instances required at child nodes to create the parent split," + +s" default: ${defaultParams.minInstancesPerNode}") +.action((x, c) => c.copy(minInstancesPerNode = x)) + opt[Double]("minInfoGain") +.text(s"min info gain required to create a split, default: ${defaultParams.minInfoGain}") +.action((x, c) => c.copy(minInfoGain = x)) + opt[Int]("maxIter") +.text(s"number of trees in ensemble, default: ${defaultParams.maxIter}") +.action((x, c) => c.copy(maxIter = x)) + opt[Double]("fracTest") +.text(s"fraction of data to hold out for testing. If given option testInput, " + +s"this option is ignored. default: ${defaultParams.fracTest}") +.action((x, c) => c.copy(fracTest = x)) + opt[Boolean]("cacheNodeIds") +.text(s"whether to use node Id cache during training, " + +s"default: ${defaultParams.cacheNodeIds}") +.action((x, c) => c.copy(cacheNodeIds = x)) + opt[String]("checkpointDir") +.text(s"checkpoint directory where intermediate node Id caches will be stored, " + +s"default: ${
[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5626#discussion_r29024771 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/RandomForestRegressor.scala --- @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.regression + +import org.apache.spark.annotation.AlphaComponent +import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor} +import org.apache.spark.ml.impl.tree.{RandomForestParams, TreeRegressorParams} +import org.apache.spark.ml.param.{Params, ParamMap} +import org.apache.spark.ml.tree.{DecisionTreeModel, TreeEnsembleModel} +import org.apache.spark.ml.util.MetadataUtils +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.mllib.tree.{RandomForest => OldRandomForest} +import org.apache.spark.mllib.tree.configuration.{Algo => OldAlgo, Strategy => OldStrategy} +import org.apache.spark.mllib.tree.model.{RandomForestModel => OldRandomForestModel} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.DataFrame + + +/** + * :: AlphaComponent :: + * + * [[http://en.wikipedia.org/wiki/Random_forest Random Forest]] learning algorithm for regression. + * It supports both continuous and categorical features. + */ +@AlphaComponent +final class RandomForestRegressor + extends Predictor[Vector, RandomForestRegressor, RandomForestRegressionModel] + with RandomForestParams with TreeRegressorParams { + + // Override parameter setters from parent trait for Java API compatibility. + + // Parameters from TreeRegressorParams: + + override def setMaxDepth(value: Int): this.type = super.setMaxDepth(value) + + override def setMaxBins(value: Int): this.type = super.setMaxBins(value) + + override def setMinInstancesPerNode(value: Int): this.type = +super.setMinInstancesPerNode(value) + + override def setMinInfoGain(value: Double): this.type = super.setMinInfoGain(value) + + override def setMaxMemoryInMB(value: Int): this.type = super.setMaxMemoryInMB(value) + + override def setCacheNodeIds(value: Boolean): this.type = super.setCacheNodeIds(value) + + override def setCheckpointInterval(value: Int): this.type = super.setCheckpointInterval(value) + + override def setImpurity(value: String): this.type = super.setImpurity(value) + + // Parameters from TreeEnsembleParams: + + override def setSubsamplingRate(value: Double): this.type = super.setSubsamplingRate(value) + + override def setSeed(value: Long): this.type = super.setSeed(value) + + // Parameters from RandomForestParams: + + override def setNumTrees(value: Int): this.type = super.setNumTrees(value) + + override def setFeatureSubsetStrategy(value: String): this.type = +super.setFeatureSubsetStrategy(value) + + override protected def train( + dataset: DataFrame, + paramMap: ParamMap): RandomForestRegressionModel = { +val categoricalFeatures: Map[Int, Int] = + MetadataUtils.getCategoricalFeatures(dataset.schema(paramMap(featuresCol))) +val oldDataset: RDD[LabeledPoint] = extractLabeledPoints(dataset, paramMap) +val strategy = + super.getOldStrategy(categoricalFeatures, numClasses = 0, OldAlgo.Regression, getOldImpurity) +val oldModel = OldRandomForest.trainRegressor( + oldDataset, strategy, getNumTrees, getFeatureSubsetStrategy, getSeed.toInt) +RandomForestRegressionModel.fromOld(oldModel, this, paramMap, categoricalFeatures) + } +} + +object RandomForestRegressor { + /** Accessor for supported impurity settings: variance */ + final val supportedImpurities: Array[String] = TreeRegressorParams.supportedImpurities + + /** Accessor for supported featureSubsetStrategy settings: auto, all, onethird, sqrt, log2 */ + final val suppo
[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5626#discussion_r29024769 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -85,18 +82,16 @@ final class DecisionTreeClassifier } /** (private[ml]) Create a Strategy instance to use with the old API. */ --- End diff -- Is it useful to mention `(private[ml])` in the JavaDoc? This seems to be duplicated info. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5626#discussion_r29024757 --- Diff: mllib/src/test/java/org/apache/spark/ml/classification/JavaGBTClassifierSuite.java --- @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification; + +import java.io.Serializable; +import java.util.HashMap; +import java.util.Map; + +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.ml.impl.TreeTests; +import org.apache.spark.mllib.classification.LogisticRegressionSuite; +import org.apache.spark.mllib.regression.LabeledPoint; +import org.apache.spark.sql.DataFrame; + + +public class JavaGBTClassifierSuite implements Serializable { + + private transient JavaSparkContext sc; + + @Before + public void setUp() { +sc = new JavaSparkContext("local", "JavaGBTClassifierSuite"); + } + + @After + public void tearDown() { +sc.stop(); +sc = null; + } + + @Test + public void runDT() { +int nPoints = 20; +double A = 2.0; +double B = -1.5; + +JavaRDD data = sc.parallelize( +LogisticRegressionSuite.generateLogisticInputAsList(A, B, nPoints, 42), 2).cache(); +Map categoricalFeatures = new HashMap(); +DataFrame dataFrame = TreeTests.setMetadata(data, categoricalFeatures, 2); + +// This tests setters. Training with various options is tested in Scala. +GBTClassifier rf = new GBTClassifier() +.setMaxDepth(2) +.setMaxBins(10) +.setMinInstancesPerNode(5) +.setMinInfoGain(0.0) +.setMaxMemoryInMB(256) +.setCacheNodeIds(false) +.setCheckpointInterval(10) +.setSubsamplingRate(1.0) +.setSeed(1234) +.setMaxIter(3) +.setStepSize(0.1) +.setMaxDepth(2); // duplicate setMaxDepth to check builder pattern +for (int i = 0; i < GBTClassifier.supportedLossTypes().length; ++i) { --- End diff -- ~~~java for (String lossType: GBTClassifier.supportedLossTypes()) { ... } ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5626#discussion_r29024748 --- Diff: mllib/src/main/scala/org/apache/spark/ml/impl/tree/treeParams.scala --- @@ -296,5 +299,194 @@ private[ml] trait TreeRegressorParams extends Params { private[ml] object TreeRegressorParams { // These options should be lowercase. - val supportedImpurities: Array[String] = Array("variance").map(_.toLowerCase) + final val supportedImpurities: Array[String] = Array("variance").map(_.toLowerCase) +} + +/** + * :: DeveloperApi :: + * Parameters for Decision Tree-based ensemble algorithms. + * + * Note: Marked as private and DeveloperApi since this may be made public in the future. + */ +@DeveloperApi +private[ml] trait TreeEnsembleParams extends DecisionTreeParams with HasSeed { + + /** + * Fraction of the training data used for learning each decision tree. + * (default = 1.0) + * @group param + */ + final val subsamplingRate: DoubleParam = new DoubleParam(this, "subsamplingRate", +"Fraction of the training data used for learning each decision tree.") + + setDefault(subsamplingRate -> 1.0) + + /** @group setParam */ + def setSubsamplingRate(value: Double): this.type = { +require(value > 0.0 && value <= 1.0, + s"Subsampling rate must be in range (0,1]. Bad rate: $value") +set(subsamplingRate, value) +this + } + + /** @group getParam */ + final def getSubsamplingRate: Double = getOrDefault(subsamplingRate) + + /** @group setParam */ + def setSeed(value: Long): this.type = { --- End diff -- `= set(seed.value)` should be sufficient. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5626#discussion_r29024762 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala --- @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import org.scalatest.FunSuite + +import org.apache.spark.ml.impl.TreeTests +import org.apache.spark.mllib.linalg.Vectors +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.mllib.tree.{EnsembleTestHelper, RandomForest => OldRandomForest} +import org.apache.spark.mllib.tree.configuration.{Algo => OldAlgo} +import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.DataFrame + + +/** + * Test suite for [[RandomForestClassifier]]. + */ +class RandomForestClassifierSuite extends FunSuite with MLlibTestSparkContext { + + import RandomForestClassifierSuite.compareAPIs + + private var orderedLabeledPoints50_1000: RDD[LabeledPoint] = _ + private var orderedLabeledPoints5_20: RDD[LabeledPoint] = _ + + override def beforeAll() { +super.beforeAll() +orderedLabeledPoints50_1000 = + sc.parallelize(EnsembleTestHelper.generateOrderedLabeledPoints(numFeatures = 50, 1000)) +orderedLabeledPoints5_20 = + sc.parallelize(EnsembleTestHelper.generateOrderedLabeledPoints(numFeatures = 5, 20)) + } + + / + // Tests calling train() + / + + def binaryClassificationTestWithContinuousFeatures(rf: RandomForestClassifier) { +val categoricalFeatures = Map.empty[Int, Int] +val numClasses = 2 +val newRF = rf + .setImpurity("Gini") + .setMaxDepth(2) + .setNumTrees(1) + .setFeatureSubsetStrategy("auto") + .setSeed(123) +compareAPIs(orderedLabeledPoints50_1000, newRF, categoricalFeatures, numClasses) + } + + test("Binary classification with continuous features:" + +" comparing DecisionTree vs. RandomForest(numTrees = 1)") { +val rf = new RandomForestClassifier() +binaryClassificationTestWithContinuousFeatures(rf) + } + + test("Binary classification with continuous features and node Id cache:" + +" comparing DecisionTree vs. RandomForest(numTrees = 1)") { +val rf = new RandomForestClassifier() + .setCacheNodeIds(true) +binaryClassificationTestWithContinuousFeatures(rf) + } + + test("alternating categorical and continuous features with multiclass labels to test indexing") { +val arr = new Array[LabeledPoint](4) --- End diff -- ~~~scala val arr = Array( LabeledPoint(0.0, Vectors.dense(1.0, 0.0, 0.0, 3.0, 1.0)), ...) ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5626#discussion_r29024737 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import com.github.fommil.netlib.BLAS.{getInstance => blas} + +import org.apache.spark.Logging +import org.apache.spark.annotation.AlphaComponent +import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor} +import org.apache.spark.ml.impl.tree._ +import org.apache.spark.ml.param.{Param, Params, ParamMap} +import org.apache.spark.ml.regression.DecisionTreeRegressionModel +import org.apache.spark.ml.tree.{DecisionTreeModel, TreeEnsembleModel} +import org.apache.spark.ml.util.MetadataUtils +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.mllib.tree.{GradientBoostedTrees => OldGBT} +import org.apache.spark.mllib.tree.configuration.{Algo => OldAlgo} +import org.apache.spark.mllib.tree.loss.{Loss => OldLoss, LogLoss => OldLogLoss} +import org.apache.spark.mllib.tree.model.{GradientBoostedTreesModel => OldGBTModel} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.DataFrame + + +/** + * :: AlphaComponent :: + * + * [[http://en.wikipedia.org/wiki/Gradient_boosting Gradient-Boosted Trees (GBTs)]] + * learning algorithm for classification. + * It supports binary labels, as well as both continuous and categorical features. + * Note: Multiclass labels are not currently supported. + */ +@AlphaComponent +final class GBTClassifier + extends Predictor[Vector, GBTClassifier, GBTClassificationModel] + with GBTParams with TreeClassifierParams with Logging { + + // Override parameter setters from parent trait for Java API compatibility. + + // Parameters from TreeClassifierParams: + + override def setMaxDepth(value: Int): this.type = super.setMaxDepth(value) + + override def setMaxBins(value: Int): this.type = super.setMaxBins(value) + + override def setMinInstancesPerNode(value: Int): this.type = +super.setMinInstancesPerNode(value) + + override def setMinInfoGain(value: Double): this.type = super.setMinInfoGain(value) + + override def setMaxMemoryInMB(value: Int): this.type = super.setMaxMemoryInMB(value) + + override def setCacheNodeIds(value: Boolean): this.type = super.setCacheNodeIds(value) + + override def setCheckpointInterval(value: Int): this.type = super.setCheckpointInterval(value) + + /** + * The impurity setting is ignored for GBT models. + * Individual trees are built using impurity "Variance." + */ + override def setImpurity(value: String): this.type = { +logWarning("GBTClassifier.setImpurity should NOT be used") +this + } + + // Parameters from TreeEnsembleParams: + + override def setSubsamplingRate(value: Double): this.type = super.setSubsamplingRate(value) + + override def setSeed(value: Long): this.type = { +logWarning("The 'seed' parameter is currently ignored by Gradient Boosting.") +super.setSeed(value) + } + + // Parameters from GBTParams: + + override def setMaxIter(value: Int): this.type = super.setMaxIter(value) + + override def setStepSize(value: Double): this.type = super.setStepSize(value) + + // Parameters for GBTClassifier: + + /** + * Loss function which GBT tries to minimize. (case-insensitive) + * Supported: "logistic" + * (default = logistic) + * @group param + */ + val lossType: Param[String] = new Param[String](this, "lossType", "Loss function which GBT" + +" tries to minimize (case-insensitive). Supported options:" + +s" ${GBTClassifier.supportedLossTypes.mkString(", ")}") + + setDefault(lossT
[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5626#discussion_r29024743 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/RandomForestClassifier.scala --- @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import scala.collection.mutable + +import org.apache.spark.annotation.AlphaComponent +import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor} +import org.apache.spark.ml.impl.tree._ +import org.apache.spark.ml.param.{Params, ParamMap} +import org.apache.spark.ml.tree.{DecisionTreeModel, TreeEnsembleModel} +import org.apache.spark.ml.util.MetadataUtils +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.mllib.tree.{RandomForest => OldRandomForest} +import org.apache.spark.mllib.tree.configuration.{Algo => OldAlgo, Strategy => OldStrategy} +import org.apache.spark.mllib.tree.model.{RandomForestModel => OldRandomForestModel} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.DataFrame + + +/** + * :: AlphaComponent :: + * + * [[http://en.wikipedia.org/wiki/Random_forest Random Forest]] learning algorithm for + * classification. + * It supports both binary and multiclass labels, as well as both continuous and categorical + * features. + */ +@AlphaComponent +final class RandomForestClassifier + extends Predictor[Vector, RandomForestClassifier, RandomForestClassificationModel] + with RandomForestParams with TreeClassifierParams { + + // Override parameter setters from parent trait for Java API compatibility. + + // Parameters from TreeClassifierParams: + + override def setMaxDepth(value: Int): this.type = super.setMaxDepth(value) + + override def setMaxBins(value: Int): this.type = super.setMaxBins(value) + + override def setMinInstancesPerNode(value: Int): this.type = +super.setMinInstancesPerNode(value) + + override def setMinInfoGain(value: Double): this.type = super.setMinInfoGain(value) + + override def setMaxMemoryInMB(value: Int): this.type = super.setMaxMemoryInMB(value) + + override def setCacheNodeIds(value: Boolean): this.type = super.setCacheNodeIds(value) + + override def setCheckpointInterval(value: Int): this.type = super.setCheckpointInterval(value) + + override def setImpurity(value: String): this.type = super.setImpurity(value) + + // Parameters from TreeEnsembleParams: + + override def setSubsamplingRate(value: Double): this.type = super.setSubsamplingRate(value) + + override def setSeed(value: Long): this.type = super.setSeed(value) + + // Parameters from RandomForestParams: + + override def setNumTrees(value: Int): this.type = super.setNumTrees(value) + + override def setFeatureSubsetStrategy(value: String): this.type = +super.setFeatureSubsetStrategy(value) + + override protected def train( + dataset: DataFrame, + paramMap: ParamMap): RandomForestClassificationModel = { +val categoricalFeatures: Map[Int, Int] = + MetadataUtils.getCategoricalFeatures(dataset.schema(paramMap(featuresCol))) +val numClasses: Int = MetadataUtils.getNumClasses(dataset.schema(paramMap(labelCol))) match { + case Some(n: Int) => n + case None => throw new IllegalArgumentException("RandomForestClassifier was given input" + +s" with invalid label column, without the number of classes specified.") + // TODO: Automatically index labels. +} +val oldDataset: RDD[LabeledPoint] = extractLabeledPoints(dataset, paramMap) +val strategy = + super.getOldStrategy(categoricalFeatures, numClasses, OldAlgo.Classification, getOldImpurity) +val oldModel = OldRandomForest.trainClassifier( + oldDataset, strategy, getNumTrees, getFe
[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5645#issuecomment-95806380 [Test build #30913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30913/consoleFull) for PR 5645 at commit [`1a32a4b`](https://github.com/apache/spark/commit/1a32a4b5ce740721343915452974c9fb3f9a3910). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5626#discussion_r29024756 --- Diff: mllib/src/test/java/org/apache/spark/ml/classification/JavaGBTClassifierSuite.java --- @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification; + +import java.io.Serializable; +import java.util.HashMap; +import java.util.Map; + +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.ml.impl.TreeTests; +import org.apache.spark.mllib.classification.LogisticRegressionSuite; +import org.apache.spark.mllib.regression.LabeledPoint; +import org.apache.spark.sql.DataFrame; + + +public class JavaGBTClassifierSuite implements Serializable { + + private transient JavaSparkContext sc; + + @Before + public void setUp() { +sc = new JavaSparkContext("local", "JavaGBTClassifierSuite"); + } + + @After + public void tearDown() { +sc.stop(); +sc = null; + } + + @Test + public void runDT() { +int nPoints = 20; +double A = 2.0; +double B = -1.5; + +JavaRDD data = sc.parallelize( +LogisticRegressionSuite.generateLogisticInputAsList(A, B, nPoints, 42), 2).cache(); --- End diff -- 2-space indentation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5626#discussion_r29024746 --- Diff: mllib/src/main/scala/org/apache/spark/ml/impl/tree/treeParams.scala --- @@ -296,5 +299,194 @@ private[ml] trait TreeRegressorParams extends Params { private[ml] object TreeRegressorParams { // These options should be lowercase. - val supportedImpurities: Array[String] = Array("variance").map(_.toLowerCase) + final val supportedImpurities: Array[String] = Array("variance").map(_.toLowerCase) +} + +/** + * :: DeveloperApi :: + * Parameters for Decision Tree-based ensemble algorithms. + * + * Note: Marked as private and DeveloperApi since this may be made public in the future. + */ +@DeveloperApi +private[ml] trait TreeEnsembleParams extends DecisionTreeParams with HasSeed { + + /** + * Fraction of the training data used for learning each decision tree. + * (default = 1.0) + * @group param + */ + final val subsamplingRate: DoubleParam = new DoubleParam(this, "subsamplingRate", +"Fraction of the training data used for learning each decision tree.") + + setDefault(subsamplingRate -> 1.0) + + /** @group setParam */ + def setSubsamplingRate(value: Double): this.type = { +require(value > 0.0 && value <= 1.0, + s"Subsampling rate must be in range (0,1]. Bad rate: $value") +set(subsamplingRate, value) +this --- End diff -- `this` is not required. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5645#issuecomment-95806388 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30913/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5626#discussion_r29024741 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/RandomForestClassifier.scala --- @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import scala.collection.mutable + +import org.apache.spark.annotation.AlphaComponent +import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor} +import org.apache.spark.ml.impl.tree._ +import org.apache.spark.ml.param.{Params, ParamMap} +import org.apache.spark.ml.tree.{DecisionTreeModel, TreeEnsembleModel} +import org.apache.spark.ml.util.MetadataUtils +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.mllib.tree.{RandomForest => OldRandomForest} +import org.apache.spark.mllib.tree.configuration.{Algo => OldAlgo, Strategy => OldStrategy} +import org.apache.spark.mllib.tree.model.{RandomForestModel => OldRandomForestModel} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.DataFrame + + +/** + * :: AlphaComponent :: + * + * [[http://en.wikipedia.org/wiki/Random_forest Random Forest]] learning algorithm for + * classification. + * It supports both binary and multiclass labels, as well as both continuous and categorical + * features. + */ +@AlphaComponent +final class RandomForestClassifier + extends Predictor[Vector, RandomForestClassifier, RandomForestClassificationModel] + with RandomForestParams with TreeClassifierParams { + + // Override parameter setters from parent trait for Java API compatibility. + + // Parameters from TreeClassifierParams: + + override def setMaxDepth(value: Int): this.type = super.setMaxDepth(value) + + override def setMaxBins(value: Int): this.type = super.setMaxBins(value) + + override def setMinInstancesPerNode(value: Int): this.type = +super.setMinInstancesPerNode(value) + + override def setMinInfoGain(value: Double): this.type = super.setMinInfoGain(value) + + override def setMaxMemoryInMB(value: Int): this.type = super.setMaxMemoryInMB(value) + + override def setCacheNodeIds(value: Boolean): this.type = super.setCacheNodeIds(value) + + override def setCheckpointInterval(value: Int): this.type = super.setCheckpointInterval(value) + + override def setImpurity(value: String): this.type = super.setImpurity(value) + + // Parameters from TreeEnsembleParams: + + override def setSubsamplingRate(value: Double): this.type = super.setSubsamplingRate(value) + + override def setSeed(value: Long): this.type = super.setSeed(value) + + // Parameters from RandomForestParams: + + override def setNumTrees(value: Int): this.type = super.setNumTrees(value) + + override def setFeatureSubsetStrategy(value: String): this.type = +super.setFeatureSubsetStrategy(value) + + override protected def train( + dataset: DataFrame, + paramMap: ParamMap): RandomForestClassificationModel = { +val categoricalFeatures: Map[Int, Int] = + MetadataUtils.getCategoricalFeatures(dataset.schema(paramMap(featuresCol))) +val numClasses: Int = MetadataUtils.getNumClasses(dataset.schema(paramMap(labelCol))) match { + case Some(n: Int) => n + case None => throw new IllegalArgumentException("RandomForestClassifier was given input" + +s" with invalid label column, without the number of classes specified.") --- End diff -- Mention the label column name in the error message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ti
[GitHub] spark pull request: [SQL] Fixed expression data type matching.
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5675#discussion_r29024715 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala --- @@ -40,32 +40,46 @@ import org.apache.spark.util.Utils */ @DeveloperApi abstract class DataType { - /** Matches any expression that evaluates to this DataType */ - def unapply(a: Expression): Boolean = a match { + /** + * Enables matching against NumericType for expressions: --- End diff -- ah yes - I will fix that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r29024688 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -17,17 +17,172 @@ package org.apache.spark.ui.jobs -import scala.xml.{Node, NodeSeq} +import scala.collection.mutable.{HashMap, ListBuffer} +import scala.xml.{Node, NodeSeq, Unparsed} +import java.util.Date import javax.servlet.http.HttpServletRequest -import org.apache.spark.ui.{WebUIPage, UIUtils} -import org.apache.spark.ui.jobs.UIData.JobUIData +import org.apache.spark.ui.{UIUtils, WebUIPage} +import org.apache.spark.ui.jobs.UIData.{ExecutorUIData, JobUIData} +import org.apache.spark.JobExecutionStatus /** Page showing list of all ongoing and recently finished jobs */ private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage("") { - private val startTime: Option[Long] = parent.sc.map(_.startTime) - private val listener = parent.listener + private val JOBS_LEGEND = + + + Succeeded Job + + Failed Job + + Running Job +.toString.filter(_ != '\n') + + private val EXECUTORS_LEGEND = + + --- End diff -- I think stroke and fill can work. I'll address it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7109] [SQL] Push down left side filter ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5677#issuecomment-95805563 [Test build #30911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30911/consoleFull) for PR 5677 at commit [`ebadaa9`](https://github.com/apache/spark/commit/ebadaa9798752498004fc3bc53de07ed53b49f7b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7109] [SQL] Push down left side filter ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5677#issuecomment-95805571 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30911/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Fixed expression data type matching.
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/5675#discussion_r29024422 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala --- @@ -40,32 +40,46 @@ import org.apache.spark.util.Utils */ @DeveloperApi abstract class DataType { - /** Matches any expression that evaluates to this DataType */ - def unapply(a: Expression): Boolean = a match { + /** + * Enables matching against NumericType for expressions: --- End diff -- typo? Seems it should be DataType. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7103][Spark Core]Verify patitionors are...
GitHub user vinodkc opened a pull request: https://github.com/apache/spark/pull/5678 [SPARK-7103][Spark Core]Verify patitionors are available in all RDDs used in PartitionerAwareUnionRDD You can merge this pull request into a Git repository by running: $ git pull https://github.com/vinodkc/spark fix_unionRDD_without_partition Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5678.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5678 commit 8f414d145569041b30766be6a9c6880297303b3c Author: Vinod K C Date: 2015-04-24T07:52:16Z Verify patitionors of RDDs in union --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7103][Spark Core]Verify patitionors are...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5678#issuecomment-95805275 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6612] [MLLib] [PySpark] Python KMeans p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5647#issuecomment-95805052 [Test build #30912 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30912/consoleFull) for PR 5647 at commit [`9903837`](https://github.com/apache/spark/commit/990383761841b444506e91f3052c2de3736d6052). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5687][Core]TaskResultGetter needs to ca...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4474#issuecomment-95805067 I'd be happy to have a patch that kills the JVM when this occurs, with a warning message logged. I didn't realize in your original submission that this was actually just killing the thread but allowing the JVM to survive. Really, once we are out of memory, we should coerce the JVM to terminate. I agree that is better than having a silent thread death. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6612] [MLLib] [PySpark] Python KMeans p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5647#issuecomment-95805062 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30912/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/5643#discussion_r29024298 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastLeftSemiJoinHash.scala --- @@ -32,36 +32,69 @@ case class BroadcastLeftSemiJoinHash( leftKeys: Seq[Expression], rightKeys: Seq[Expression], left: SparkPlan, -right: SparkPlan) extends BinaryNode with HashJoin { +right: SparkPlan, +condition: Option[Expression]) extends BinaryNode with HashJoin { override val buildSide: BuildSide = BuildRight override def output: Seq[Attribute] = left.output + @transient private lazy val boundCondition = +condition.map(newPredicate(_, left.output ++ right.output)).getOrElse((row: Row) => true) --- End diff -- `newPredicate(condition.getOrElse(Literal(true)), left.output ++ right.output)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7093][SQL] Using newPredicate in Nested...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/5665#issuecomment-95804784 `newPredicate(condition.getOrElse(Literal(true)), left.output ++ right.output)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r29024275 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -17,17 +17,172 @@ package org.apache.spark.ui.jobs -import scala.xml.{Node, NodeSeq} +import scala.collection.mutable.{HashMap, ListBuffer} +import scala.xml.{Node, NodeSeq, Unparsed} +import java.util.Date import javax.servlet.http.HttpServletRequest -import org.apache.spark.ui.{WebUIPage, UIUtils} -import org.apache.spark.ui.jobs.UIData.JobUIData +import org.apache.spark.ui.{UIUtils, WebUIPage} +import org.apache.spark.ui.jobs.UIData.{ExecutorUIData, JobUIData} +import org.apache.spark.JobExecutionStatus /** Page showing list of all ongoing and recently finished jobs */ private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage("") { - private val startTime: Option[Long] = parent.sc.map(_.startTime) - private val listener = parent.listener + private val JOBS_LEGEND = + + + Succeeded Job + + Failed Job + + Running Job +.toString.filter(_ != '\n') + + private val EXECUTORS_LEGEND = + + --- End diff -- I only really care about the colors (stroke and fill) are you sure it does not work for those? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7009] repackaging spark assembly jar wi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5637#issuecomment-95804369 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30906/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7009] repackaging spark assembly jar wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5637#issuecomment-95804360 [Test build #30906 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30906/consoleFull) for PR 5637 at commit [`ab38c71`](https://github.com/apache/spark/commit/ab38c71356c23d63ca9f3990c8c0f0b8e8fc7976). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-95803737 [Test build #30916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30916/consoleFull) for PR 2342 at commit [`d3c63c8`](https://github.com/apache/spark/commit/d3c63c84a56041756841dd0706d87c8c808e84d3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5626#discussion_r29024038 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import com.github.fommil.netlib.BLAS.{getInstance => blas} + +import org.apache.spark.Logging +import org.apache.spark.annotation.AlphaComponent +import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor} +import org.apache.spark.ml.impl.tree._ +import org.apache.spark.ml.param.{Param, Params, ParamMap} +import org.apache.spark.ml.regression.DecisionTreeRegressionModel +import org.apache.spark.ml.tree.{DecisionTreeModel, TreeEnsembleModel} +import org.apache.spark.ml.util.MetadataUtils +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.mllib.tree.{GradientBoostedTrees => OldGBT} +import org.apache.spark.mllib.tree.configuration.{Algo => OldAlgo} +import org.apache.spark.mllib.tree.loss.{Loss => OldLoss, LogLoss => OldLogLoss} +import org.apache.spark.mllib.tree.model.{GradientBoostedTreesModel => OldGBTModel} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.DataFrame + + +/** + * :: AlphaComponent :: + * + * [[http://en.wikipedia.org/wiki/Gradient_boosting Gradient-Boosted Trees (GBTs)]] + * learning algorithm for classification. + * It supports binary labels, as well as both continuous and categorical features. + * Note: Multiclass labels are not currently supported. + */ +@AlphaComponent +final class GBTClassifier + extends Predictor[Vector, GBTClassifier, GBTClassificationModel] + with GBTParams with TreeClassifierParams with Logging { + + // Override parameter setters from parent trait for Java API compatibility. + + // Parameters from TreeClassifierParams: + + override def setMaxDepth(value: Int): this.type = super.setMaxDepth(value) + + override def setMaxBins(value: Int): this.type = super.setMaxBins(value) + + override def setMinInstancesPerNode(value: Int): this.type = +super.setMinInstancesPerNode(value) + + override def setMinInfoGain(value: Double): this.type = super.setMinInfoGain(value) + + override def setMaxMemoryInMB(value: Int): this.type = super.setMaxMemoryInMB(value) + + override def setCacheNodeIds(value: Boolean): this.type = super.setCacheNodeIds(value) + + override def setCheckpointInterval(value: Int): this.type = super.setCheckpointInterval(value) + + /** + * The impurity setting is ignored for GBT models. + * Individual trees are built using impurity "Variance." + */ + override def setImpurity(value: String): this.type = { +logWarning("GBTClassifier.setImpurity should NOT be used") +this + } + + // Parameters from TreeEnsembleParams: + + override def setSubsamplingRate(value: Double): this.type = super.setSubsamplingRate(value) + + override def setSeed(value: Long): this.type = { +logWarning("The 'seed' parameter is currently ignored by Gradient Boosting.") +super.setSeed(value) + } + + // Parameters from GBTParams: + + override def setMaxIter(value: Int): this.type = super.setMaxIter(value) + + override def setLearningRate(value: Double): this.type = super.setLearningRate(value) + + // Parameters for GBTClassifier: + + /** + * Loss function which GBT tries to minimize. (case-insensitive) + * Supported: "LogLoss" + * (default = LogLoss) + * @group param + */ + val loss: Param[String] = new Param[String](this, "loss", "Loss function which GBT tries to" + +" minimize (case-insensitive). Supported options: LogLoss") + + setDefault(loss -> "logloss") + + /** @group setParam */ + def
[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5643#issuecomment-95800072 [Test build #30905 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30905/consoleFull) for PR 5643 at commit [`d29f9a6`](https://github.com/apache/spark/commit/d29f9a640a9882fd469d995a7ecd92b230cd8a65). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5643#issuecomment-95800074 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30905/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6852][SPARKR] Accept numeric as numPart...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5613#issuecomment-95799955 [Test build #30907 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30907/consoleFull) for PR 5613 at commit [`abaf02e`](https://github.com/apache/spark/commit/abaf02e611359102f3117e3fa484923155f3f314). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6852][SPARKR] Accept numeric as numPart...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5613#issuecomment-95799978 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30907/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-95799659 @marmbrus Any more comment on this before merging? It will be great appreciated if you merge this soon, as I did take lots of time in rebase again and again. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6122][Core] Upgrade tachyon-client vers...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5354#issuecomment-95799433 [Test build #30915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30915/consoleFull) for PR 5354 at commit [`0eefe4d`](https://github.com/apache/spark/commit/0eefe4d46c0a42859b8c9c0bc0ff98a0beeb440a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6122][Core] Upgrade tachyon-client vers...
Github user calvinjia commented on the pull request: https://github.com/apache/spark/pull/5354#issuecomment-95799161 @srowen I appreciate the feedback, and I've cleaned up the httpclient versions as you suggested. Do you have any other comments? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7044][SQL] Fix the deadlock in ScriptTr...
Github user chenghao-intel closed the pull request at: https://github.com/apache/spark/pull/5671 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7044][SQL] Fix the deadlock in ScriptTr...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/5671#issuecomment-95798871 Thank you @rxin, closing since it's merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5645#issuecomment-95798479 [Test build #30909 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30909/consoleFull) for PR 5645 at commit [`e0d19fb`](https://github.com/apache/spark/commit/e0d19fb1f0e6472d3e0ca55223c36ed506f32709). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5645#issuecomment-95798495 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30909/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5645#issuecomment-95798520 [Test build #30914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30914/consoleFull) for PR 5645 at commit [`d7cd15b`](https://github.com/apache/spark/commit/d7cd15b5cef64766a432918e54cca4750d13745b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7084] improve saveAsTable documentation
Github user phatak-dev commented on the pull request: https://github.com/apache/spark/pull/5654#issuecomment-95798250 Added for other methods also. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5645#issuecomment-95798234 [Test build #30913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30913/consoleFull) for PR 5645 at commit [`1a32a4b`](https://github.com/apache/spark/commit/1a32a4b5ce740721343915452974c9fb3f9a3910). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6435] spark-shell --jars option does no...
Github user tsudukim commented on the pull request: https://github.com/apache/spark/pull/5227#issuecomment-95798208 I was checking about the `SparkLauncherSuite` on Windows as vanzin's comment, and faced some trouble. It seems not to related with this PR, but I'm not sure yet. Please give me more time a little. When I resolve the problem, I'll rebase this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5676#issuecomment-95797914 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30908/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...
Github user shroffpradyumn commented on the pull request: https://github.com/apache/spark/pull/5547#issuecomment-95797940 Thank you all for your feedback, and I apologize for my late reply (itâs been a rough week of midterms). @pwendell - Iâve addressed all your inline comments (memoization, Javascript indentation, JSON lists, etc.) in my latest commit. As per the load time of the graph, itâs improved a bit after moving from string representations to JSON arrays, but only by a small factor. When you say youâre skeptical about the graph scalability, what is the maximum number of tasks you want displayed on the graph? Iâm thinking of keeping it to 1000 (at the most), and having the users select a task range if they want to view a different region of tasks (say tasks 1200-2000 for example). My reason for the above is that the task stages become too cluttered above a certain number, so itâs better to keep a limit, or alternatively, increase the max height of the graph (which would involve a lot more scrolling though). @andrewor14 - The visualization doesnât currently support zooming, and it will definitely be pretty challenging to implement it on top of D3.js. However, the task-range functionality I mentioned above can serve as a pseudo-zoom feature since a user can select a task range and hence zoom into the graph. Also, breaking down the task times along the vertical axis shouldnât be that difficult so we can definitely add that later on if required (provided this patch gets accepted haha). @punya - I havenât looked into using Amber yet, and Iâll definitely check out plottable.js. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5676#issuecomment-95797907 [Test build #30908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30908/consoleFull) for PR 5676 at commit [`1693b54`](https://github.com/apache/spark/commit/1693b54f209a17ebb6bed449f81840737f97366a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Fixed expression data type matching.
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5675 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6856] [R] Make RDD information more use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5667#issuecomment-95797532 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30903/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6856] [R] Make RDD information more use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5667#issuecomment-95797525 [Test build #30903 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30903/consoleFull) for PR 5667 at commit [`9d2295e`](https://github.com/apache/spark/commit/9d2295e73046fca9e0134876a19f9638336d7023). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Fixed expression data type matching.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5675#issuecomment-95797394 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30902/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Fixed expression data type matching.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5675#issuecomment-95797389 [Test build #30902 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30902/consoleFull) for PR 5675 at commit [`0f31856`](https://github.com/apache/spark/commit/0f31856d170102ec4a7d19e9da488726c2a37bb5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7092] Update spark scala version to 2.1...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/5662#discussion_r29022546 --- Diff: repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala --- @@ -1129,7 +1129,7 @@ class SparkIMain(@BeanProperty val factory: ScriptEngineFactory, initialSettings def apply(line: String): Result = debugging(s"""parse("$line")""") { var isIncomplete = false - currentRun.reporting.withIncompleteHandler((_, _) => isIncomplete = true) { + currentRun.parsing.withIncompleteHandler((_, _) => isIncomplete = true) { --- End diff -- This was the change, it corresponds to https://github.com/scala/scala/commit/64ebac245d58221814f9c9375927e3f2e7a2d4f0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7093][SQL] Using newPredicate in Nested...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/5665#issuecomment-95792030 /cc @rxin @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6612] [MLLib] [PySpark] Python KMeans p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5647#issuecomment-95791867 [Test build #30912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30912/consoleFull) for PR 5647 at commit [`9903837`](https://github.com/apache/spark/commit/990383761841b444506e91f3052c2de3736d6052). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6612] [MLLib] [PySpark] Python KMeans p...
Github user FlytxtRnD commented on the pull request: https://github.com/apache/spark/pull/5647#issuecomment-95791587 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7044][SQL] Fix the deadlock in ScriptTr...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/5671#issuecomment-95791377 Can you close the PR? Since it was not merged into master, github won't close this automatically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update sql-programming-guide.md
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5674 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update sql-programming-guide.md
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/5674#issuecomment-95791355 Thanks. I've merged this in master & branch-1.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7109] [SQL] Push down left side filter ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5677#issuecomment-95790916 [Test build #30911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30911/consoleFull) for PR 5677 at commit [`ebadaa9`](https://github.com/apache/spark/commit/ebadaa9798752498004fc3bc53de07ed53b49f7b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7033][SPARKR] Clean usage of split. Use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5628#issuecomment-95789973 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30899/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7109] [SQL] Push down left side filter ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/5677#issuecomment-95789809 i will try to add a test case for this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7033][SPARKR] Clean usage of split. Use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5628#issuecomment-95789945 [Test build #30899 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30899/consoleFull) for PR 5628 at commit [`046bc9e`](https://github.com/apache/spark/commit/046bc9e4e664ad36903d7e6bcf832912c53c53f8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7109] [SQL] Push down left side filter ...
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/5677 [SPARK-7109] [SQL] Push down left side filter for left semi join Now in spark sql optimizer we only push down right side filter for left semi join, actually we can push down left side filter. You can merge this pull request into a Git repository by running: $ git pull https://github.com/scwf/spark leftsemi Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5677.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5677 commit ebadaa9798752498004fc3bc53de07ed53b49f7b Author: wangfei Date: 2015-04-24T03:33:01Z left filter push down for left semi join --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5547#issuecomment-95789435 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30910/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5547#issuecomment-95789429 [Test build #30910 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30910/consoleFull) for PR 5547 at commit [`5c3a2a6`](https://github.com/apache/spark/commit/5c3a2a697fca83d6de843850e786cf3406c4bd5a). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5547#issuecomment-95789315 [Test build #30910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30910/consoleFull) for PR 5547 at commit [`5c3a2a6`](https://github.com/apache/spark/commit/5c3a2a697fca83d6de843850e786cf3406c4bd5a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6924][YARN] Fix driver hangs in yarn-cl...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5663#issuecomment-95789182 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30898/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6924][YARN] Fix driver hangs in yarn-cl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5663#issuecomment-95789176 [Test build #30898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30898/consoleFull) for PR 5663 at commit [`cf80049`](https://github.com/apache/spark/commit/cf8004938e6078bf370fcbe22ad39ea05913ec66). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4233] [SQL] [WIP] UDAF Interface Refact...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5542#issuecomment-95788877 [Test build #30901 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30901/consoleFull) for PR 5542 at commit [`71f1bd5`](https://github.com/apache/spark/commit/71f1bd538b3e0befead2d1d592ce12990cb9b417). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4233] [SQL] [WIP] UDAF Interface Refact...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5542#issuecomment-9573 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30901/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5645#issuecomment-95788717 [Test build #30909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30909/consoleFull) for PR 5645 at commit [`e0d19fb`](https://github.com/apache/spark/commit/e0d19fb1f0e6472d3e0ca55223c36ed506f32709). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5676#issuecomment-95787694 [Test build #30908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30908/consoleFull) for PR 5676 at commit [`1693b54`](https://github.com/apache/spark/commit/1693b54f209a17ebb6bed449f81840737f97366a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7031][ThriftServer]let thrift server ta...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/5609#issuecomment-95787561 BTW I have tested on my cluster with setting >export SPARK_DAEMON_MEMORY=m export SPARK_DAEMON_JAVA_OPTS=" -Dx=y " in spark-env.sh. Before this patch the jinfo shows: > VM Flags: -Xms512m -Xmx512m -XX:MaxPermSize=128m After: >VM Flags: -Dx=y -Xmsm -Xmxm -XX:MaxPermSize=128m --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...
GitHub user ArcherShao opened a pull request: https://github.com/apache/spark/pull/5676 [SPARK-6891] Fix the bug that ExecutorAllocationManager will request negative number executors In ExecutorAllocationManager, executor allocate schedule at a fix rate(100ms), it will call the method 'addOrCancelExecutorRequests' first, and then remove expired excutors. Suppose at time T, no task is running or pending, and there a 5 executors runing, but all expired. 1. the method 'addOrCancelExecutorRequests' wiill be called, and the value of 'ExecutorAllocationManager.numExecutorsPending' will update to -5. 2. remove 5 expired excutors. Suppose still no task is running or pending at T+1, the method 'targetNumExecutors' will return -5, and method 'addExecutors' will be called, private def addExecutors(maxNumExecutorsNeeded: Int): Int = { val currentTarget = targetNumExecutors val actualMaxNumExecutors = math.min(maxNumExecutors, maxNumExecutorsNeeded) val newTotalExecutors = math.min(currentTarget + numExecutorsToAdd, actualMaxNumExecutors) val addRequestAcknowledged = testing || client.requestTotalExecutors(newTotalExecutors) } newTotalExecutors will be a negative number, when client.requestTotalExecutors(newTotalExecutors) called, it will throw an exception. Let method 'targetNumExecutors' return a value not less than minNumExecutors, then the newTotalExecutors will never be negative. And targetNumExecutors not less than minNumExecutors is also make sense. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ArcherShao/spark SPARK-6891 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5676.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5676 commit 1693b54f209a17ebb6bed449f81840737f97366a Author: ArcherShao Date: 2015-04-24T00:59:59Z [SPARK-6891] Fix the bug that ExecutorAllocationManager will request negative number executors --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6852][SPARKR] Accept numeric as numPart...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5613#issuecomment-95786756 [Test build #30907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30907/consoleFull) for PR 5613 at commit [`abaf02e`](https://github.com/apache/spark/commit/abaf02e611359102f3117e3fa484923155f3f314). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7009] repackaging spark assembly jar wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5637#issuecomment-95786710 [Test build #30906 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30906/consoleFull) for PR 5637 at commit [`ab38c71`](https://github.com/apache/spark/commit/ab38c71356c23d63ca9f3990c8c0f0b8e8fc7976). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-95786382 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30897/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-95786341 [Test build #30897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30897/consoleFull) for PR 4015 at commit [`81a731f`](https://github.com/apache/spark/commit/81a731f9f9a4eb828deb8d5bcc344bd28221a763). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7044][SQL] Fix the deadlock in ScriptTr...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/5671#issuecomment-95785614 Thanks. I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7009] repackaging spark assembly jar wi...
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/5637#issuecomment-95784598 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org