[GitHub] spark pull request: [SPARK-3615][Streaming]Fix Kafka unit test har...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/2483#discussion_r17955115 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala --- @@ -59,16 +58,35 @@ class KafkaStreamSuite extends TestSuiteBase { override def beforeFunction() { // Zookeeper server startup -zookeeper = new EmbeddedZookeeper(zkConnect) +zookeeper = new EmbeddedZookeeper(s$zkHost:$zkPort) +// Get the actual zookeeper binding port +zkPort = zookeeper.actualPort logInfo( 0 ) -zkClient = new ZkClient(zkConnect, zkSessionTimeout, zkConnectionTimeout, ZKStringSerializer) + +zkClient = new ZkClient(s$zkHost:$zkPort, zkSessionTimeout, zkConnectionTimeout, + ZKStringSerializer) logInfo( 1 ) // Kafka broker startup -server = new KafkaServer(brokerConf) -logInfo( 2 ) -server.startup() -logInfo( 3 ) +var bindSuccess: Boolean = false +while(!bindSuccess) { + try { +val brokerProps = getBrokerConfig(brokerPort, s$zkHost:$zkPort) +brokerConf = new KafkaConfig(brokerProps) +server = new KafkaServer(brokerConf) --- End diff -- alright. just one more round of testing, and will merge it if it passes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/2464#issuecomment-56628837 @mubarak Thank you very much for this fix! Its finally merged! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3675][SQL] Allow starting a JDBC server...
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/2515 [SPARK-3675][SQL] Allow starting a JDBC server on an existing context You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark jdbcExistingContext Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2515.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2515 commit 7866fad85ce89d38547ffed904ac9a3dbce1aed3 Author: Michael Armbrust mich...@databricks.com Date: 2014-09-24T06:13:20Z Allows starting a JDBC server on an existing context. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3675][SQL] Allow starting a JDBC server...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2515#issuecomment-56629604 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20744/consoleFull) for PR 2515 at commit [`7866fad`](https://github.com/apache/spark/commit/7866fad85ce89d38547ffed904ac9a3dbce1aed3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2514#issuecomment-56629824 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20742/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2514#issuecomment-56629822 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20742/consoleFull) for PR 2514 at commit [`83acb38`](https://github.com/apache/spark/commit/83acb38649ef41917130d7837ab9f4177fc3262d). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3615][Streaming]Fix Kafka unit test har...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2483#issuecomment-56632476 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20743/consoleFull) for PR 2483 at commit [`863`](https://github.com/apache/spark/commit/863830eb240f2b5b44a8991d0e45c49bfdaa). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3615][Streaming]Fix Kafka unit test har...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2483#issuecomment-56632481 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20743/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3675][SQL] Allow starting a JDBC server...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2515#issuecomment-56633081 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20744/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3675][SQL] Allow starting a JDBC server...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2515#issuecomment-56633079 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20744/consoleFull) for PR 2515 at commit [`7866fad`](https://github.com/apache/spark/commit/7866fad85ce89d38547ffed904ac9a3dbce1aed3). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...
GitHub user tigerquoll opened a pull request: https://github.com/apache/spark/pull/2516 Spark Core - [SPARK-3620] - Refactor of SparkSubmit Argument parsing code Argument processing seems to have gotten a lot of attention lately, so I thought I might throw my contribution into the ring. Attached for consideration and to prompt discussion is a revamp of argument handling in SparkSubmit aimed at making things a lot more consistent. The only things that have been modified are the way that configuration properties are read/ processed and prioritised Things to note include: * All configuration parameters can now be consistently set via config file * Configuration parameters defaults have been removed from the code, and placed into a property file which is read from the class path on startup. There should be no need to trace through 5 files to see what a config parameter defaults to if it is not specified, or have different default values applied in multiple places throughout the code. * Configuration parameter validation is now done once all configuration parameters have been read in and resolved from various locations, not just when reading the command line. * All property files (including spark_default_conf) are parsed by Java property handling code. All custom parsing code has been removed. Escaping of characters should now be consistent everywhere. * All configuration parameters are overridden in the same consistent way - configuration parameters for sparkSubmit are pulled form the following sources in order of priority 1. Entries specified on the command line (except from --conf entries) 2. Entries specified on the command line with --conf 3. Environment variables (including legacy variable mappings) 4. System config variables (eg by using -Dspark.var.name) 5. $(SPARK_DEFAULT_CONF)/spark-defaults.conf or $(SPARK_HOME)/conf/spark-defaults.conf if either exist 6. Hard coded defaults in class path at spark-submit-defaults.prop * A property file specified by one of the sources listed above gets read in and the properties are considered to be at the priority of the configuration source that specified the file. A property specified in a property file will not override an existing config value already specifiedby that configuration source The existing argument handling is pretty finicky - chances are high that Iâve missed some behaviour - if this PR is going to be accepted/approved let me know any bugs and Iâll fix them up and document the behaviour for future reference You can merge this pull request into a Git repository by running: $ git pull https://github.com/tigerquoll/spark-3620 master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2516.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2516 commit b1a9682dd2bbff824c4e8481fa0ce5118c47de68 Author: Dale tigerqu...@outlook.com Date: 2014-09-21T02:42:24Z Initial pass at using typesafe's conf object for handling configuration options commit 7bb5ee95b3f06147dba994e3d557221554415bfd Author: Dale tigerqu...@outlook.com Date: 2014-09-21T02:44:09Z Added defaults file commit e995a6d1e8ab898c85aa5fe259b81c630595075f Author: Dale tigerqu...@outlook.com Date: 2014-09-21T12:56:17Z Existing tests now work commit 00ee008c5652336d533d9619bc7e6306ed59138b Author: Dale tigerqu...@outlook.com Date: 2014-09-21T13:05:14Z Existing tests now work commit 295c62b067fb5204efb58892133c77fe49b877e0 Author: Dale tigerqu...@outlook.com Date: 2014-09-22T22:04:45Z Created mergedPropertyMap commit f399170e1c05d75257ff6c508a96e64cadf0d87b Author: Dale tigerqu...@outlook.com Date: 2014-09-23T00:10:40Z Moved sparkSubmitArguments module to use custom property map merging code commit b0abe3196f9e5d3f577e158704740f1eee8fbb59 Author: Dale tigerqu...@outlook.com Date: 2014-09-23T23:58:55Z Merge branch 'master' of https://github.com/apache/spark commit 562ec7c064e5ad632cf7aaa1720be29fe36b5c9a Author: Dale tigerqu...@outlook.com Date: 2014-09-23T23:59:52Z note for additional tests commit 86f71f8bb8291fe20a2f0ca0100727d583e97dfd Author: Dale tigerqu...@outlook.com Date: 2014-09-24T00:39:47Z Changes needed to pass scalastyle check commit 2019554ec307c8d3eabee7e4299cd8bac8faba0f Author: Dale tigerqu...@outlook.com Date: 2014-09-24T04:43:58Z Changes needed to pass scalastyle check, merged from current SparkSubmit.scala commit 8c416a04d064c1475a184785a9135d849c239bff Author: Dale tigerqu...@outlook.com Date: 2014-09-24T05:19:24Z Fixed some typos commit b69f58e65d919a689942866f59b11a7dcf2fbf91 Author: Dale tigerqu...@outlook.com Date: 2014-09-24T07:08:01Z Added spark.app.name to defaults list
[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2516#issuecomment-56634577 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17957565 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala --- @@ -0,0 +1,430 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.tree + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.Logging +import org.apache.spark.annotation.Experimental +import org.apache.spark.api.java.JavaRDD +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.mllib.tree.configuration.Algo._ +import org.apache.spark.mllib.tree.configuration.QuantileStrategy._ +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impl.{BaggedPoint, TreePoint, DecisionTreeMetadata, TimeTracker} +import org.apache.spark.mllib.tree.impurity.Impurities +import org.apache.spark.mllib.tree.model._ +import org.apache.spark.rdd.RDD +import org.apache.spark.storage.StorageLevel +import org.apache.spark.util.Utils + +/** + * :: Experimental :: + * A class which implements a random forest learning algorithm for classification and regression. + * It supports both continuous and categorical features. + * + * @param strategy The configuration parameters for the random forest algorithm which specify + * the type of algorithm (classification, regression, etc.), feature type + * (continuous, categorical), depth of the tree, quantile calculation strategy, + * etc. + * @param numTrees If 1, then no bootstrapping is used. If 1, then bootstrapping is done. + * @param featureSubsetStrategy Number of features to consider for splits at each node. + * Supported: auto (default), all, sqrt, log2, onethird. + * If auto is set, this parameter is set based on numTrees: + * if numTrees == 1, then featureSubsetStrategy = all; + * if numTrees 1, then featureSubsetStrategy = sqrt. + * @param seed Random seed for bootstrapping and choosing feature subsets. + */ +@Experimental +private class RandomForest ( +private val strategy: Strategy, +private val numTrees: Int, +featureSubsetStrategy: String, +private val seed: Int) + extends Serializable with Logging { + + strategy.assertValid() + require(numTrees 0, sRandomForest requires numTrees 0, but was given numTrees = $numTrees.) + require(RandomForest.supportedFeatureSubsetStrategies.contains(featureSubsetStrategy), +sRandomForest given invalid featureSubsetStrategy: $featureSubsetStrategy. + +s Supported values: ${RandomForest.supportedFeatureSubsetStrategies.mkString(, )}.) + + /** + * Method to train a decision tree model over an RDD + * @param input Training data: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]] + * @return RandomForestModel that can be used for prediction + */ + def train(input: RDD[LabeledPoint]): RandomForestModel = { + +val timer = new TimeTracker() + +timer.start(total) + +timer.start(init) + +val retaggedInput = input.retag(classOf[LabeledPoint]) +val metadata = + DecisionTreeMetadata.buildMetadata(retaggedInput, strategy, numTrees, featureSubsetStrategy) +logDebug(algo = + strategy.algo) +logDebug(numTrees = + numTrees) +logDebug(seed = + seed) +logDebug(maxBins = + metadata.maxBins) +logDebug(featureSubsetStrategy = + featureSubsetStrategy) +logDebug(numFeaturesPerNode = + metadata.numFeaturesPerNode) + +// Find the splits and the corresponding bins (interval between the splits) using a sample +// of the input data. +timer.start(findSplitsBins) +val (splits, bins) =
[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17957573 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala --- @@ -0,0 +1,430 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.tree + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.Logging +import org.apache.spark.annotation.Experimental +import org.apache.spark.api.java.JavaRDD +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.mllib.tree.configuration.Algo._ +import org.apache.spark.mllib.tree.configuration.QuantileStrategy._ +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impl.{BaggedPoint, TreePoint, DecisionTreeMetadata, TimeTracker} +import org.apache.spark.mllib.tree.impurity.Impurities +import org.apache.spark.mllib.tree.model._ +import org.apache.spark.rdd.RDD +import org.apache.spark.storage.StorageLevel +import org.apache.spark.util.Utils + +/** + * :: Experimental :: + * A class which implements a random forest learning algorithm for classification and regression. + * It supports both continuous and categorical features. + * + * @param strategy The configuration parameters for the random forest algorithm which specify + * the type of algorithm (classification, regression, etc.), feature type + * (continuous, categorical), depth of the tree, quantile calculation strategy, + * etc. + * @param numTrees If 1, then no bootstrapping is used. If 1, then bootstrapping is done. + * @param featureSubsetStrategy Number of features to consider for splits at each node. + * Supported: auto (default), all, sqrt, log2, onethird. + * If auto is set, this parameter is set based on numTrees: + * if numTrees == 1, then featureSubsetStrategy = all; + * if numTrees 1, then featureSubsetStrategy = sqrt. + * @param seed Random seed for bootstrapping and choosing feature subsets. + */ +@Experimental +private class RandomForest ( +private val strategy: Strategy, +private val numTrees: Int, +featureSubsetStrategy: String, +private val seed: Int) + extends Serializable with Logging { + + strategy.assertValid() + require(numTrees 0, sRandomForest requires numTrees 0, but was given numTrees = $numTrees.) + require(RandomForest.supportedFeatureSubsetStrategies.contains(featureSubsetStrategy), +sRandomForest given invalid featureSubsetStrategy: $featureSubsetStrategy. + +s Supported values: ${RandomForest.supportedFeatureSubsetStrategies.mkString(, )}.) + + /** + * Method to train a decision tree model over an RDD + * @param input Training data: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]] + * @return RandomForestModel that can be used for prediction + */ + def train(input: RDD[LabeledPoint]): RandomForestModel = { + +val timer = new TimeTracker() + +timer.start(total) + +timer.start(init) + +val retaggedInput = input.retag(classOf[LabeledPoint]) +val metadata = + DecisionTreeMetadata.buildMetadata(retaggedInput, strategy, numTrees, featureSubsetStrategy) +logDebug(algo = + strategy.algo) +logDebug(numTrees = + numTrees) +logDebug(seed = + seed) +logDebug(maxBins = + metadata.maxBins) +logDebug(featureSubsetStrategy = + featureSubsetStrategy) +logDebug(numFeaturesPerNode = + metadata.numFeaturesPerNode) + +// Find the splits and the corresponding bins (interval between the splits) using a sample +// of the input data. +timer.start(findSplitsBins) +val (splits, bins) =
[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2514#discussion_r17957564 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -152,7 +152,7 @@ private[spark] class ExternalSorter[K, V, C]( override def compare(a: K, b: K): Int = { val h1 = if (a == null) 0 else a.hashCode() val h2 = if (b == null) 0 else b.hashCode() - h1 - h2 + if (h1 h2) -1 else if (h1 == h2) 0 else 1 --- End diff -- @mateiz per my comment, that would no longer run in Java 6 as `Integer.compare` doesn't exist before Java 7. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17957577 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator( } this } +} + +/** + * DecisionTree statistics aggregator. + * This holds a flat array of statistics for a set of (nodes, features, bins) + * and helps with indexing. + * + * This instance of [[DTStatsAggregator]] is used when not subsampling features. + * + * @param numNodes Number of nodes to collect statistics for. + */ +private[tree] class DTStatsAggregatorFixedFeatures( +metadata: DecisionTreeMetadata, +numNodes: Int) extends DTStatsAggregator(metadata) { + + /** + * Offset for each feature for calculating indices into the [[_allStats]] array. + * Mapping: featureIndex -- offset + */ + private val featureOffsets: Array[Int] = { +metadata.numBins.scanLeft(0)((total, nBins) = total + statsSize * nBins) + } + + /** + * Number of elements for each node, corresponding to stride between nodes in [[_allStats]]. + */ + private val nodeStride: Int = featureOffsets.last + + /** + * Total number of elements stored in this aggregator. + */ + def allStatsSize: Int = numNodes * nodeStride + + /** + * Flat array of elements. + * Index for start of stats for a (node, feature, bin) is: + * index = nodeIndex * nodeStride + featureOffsets(featureIndex) + binIndex * statsSize + * Note: For unordered features, the left child stats precede the right child stats + * in the binIndex order. + */ + protected val _allStats: Array[Double] = new Array[Double](allStatsSize) + + /** + * Get flat array of elements stored in this aggregator. + */ + protected def allStats: Array[Double] = _allStats + + /** + * Update the stats for a given (node, feature, bin) for ordered features, using the given label. + */ + def update( + nodeIndex: Int, + featureIndex: Int, + binIndex: Int, + label: Double, + instanceWeight: Double): Unit = { +val i = nodeIndex * nodeStride + featureOffsets(featureIndex) + binIndex * statsSize +impurityAggregator.update(_allStats, i, label, instanceWeight) + } + + /** + * Pre-compute node offset for use with [[nodeUpdate]]. + */ + def getNodeOffset(nodeIndex: Int): Int = nodeIndex * nodeStride + + /** + * Faster version of [[update]]. + * Update the stats for a given (node, feature, bin) for ordered features, using the given label. + * @param nodeOffset Pre-computed node offset from [[getNodeOffset]]. + */ + def nodeUpdate( + nodeOffset: Int, + nodeIndex: Int, + featureIndex: Int, + binIndex: Int, + label: Double, + instanceWeight: Double): Unit = { +val i = nodeOffset + featureOffsets(featureIndex) + binIndex * statsSize +impurityAggregator.update(_allStats, i, label, instanceWeight) + } + + /** + * Pre-compute (node, feature) offset for use with [[nodeFeatureUpdate]]. + * For ordered features only. + */ + def getNodeFeatureOffset(nodeIndex: Int, featureIndex: Int): Int = { +require(!isUnordered(featureIndex), + sDTStatsAggregator.getNodeFeatureOffset is for ordered features only, but was called + +s for unordered feature $featureIndex.) +nodeIndex * nodeStride + featureOffsets(featureIndex) + } + + /** + * Pre-compute (node, feature) offset for use with [[nodeFeatureUpdate]]. + * For unordered features only. + */ + def getLeftRightNodeFeatureOffsets(nodeIndex: Int, featureIndex: Int): (Int, Int) = { +require(isUnordered(featureIndex), --- End diff -- Will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17957595 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator( } this } +} + +/** + * DecisionTree statistics aggregator. + * This holds a flat array of statistics for a set of (nodes, features, bins) + * and helps with indexing. + * + * This instance of [[DTStatsAggregator]] is used when not subsampling features. + * + * @param numNodes Number of nodes to collect statistics for. + */ +private[tree] class DTStatsAggregatorFixedFeatures( +metadata: DecisionTreeMetadata, +numNodes: Int) extends DTStatsAggregator(metadata) { + + /** + * Offset for each feature for calculating indices into the [[_allStats]] array. + * Mapping: featureIndex -- offset + */ + private val featureOffsets: Array[Int] = { +metadata.numBins.scanLeft(0)((total, nBins) = total + statsSize * nBins) + } + + /** + * Number of elements for each node, corresponding to stride between nodes in [[_allStats]]. + */ + private val nodeStride: Int = featureOffsets.last + + /** + * Total number of elements stored in this aggregator. + */ + def allStatsSize: Int = numNodes * nodeStride + + /** + * Flat array of elements. + * Index for start of stats for a (node, feature, bin) is: + * index = nodeIndex * nodeStride + featureOffsets(featureIndex) + binIndex * statsSize + * Note: For unordered features, the left child stats precede the right child stats + * in the binIndex order. + */ + protected val _allStats: Array[Double] = new Array[Double](allStatsSize) + + /** + * Get flat array of elements stored in this aggregator. + */ + protected def allStats: Array[Double] = _allStats + + /** + * Update the stats for a given (node, feature, bin) for ordered features, using the given label. + */ + def update( + nodeIndex: Int, + featureIndex: Int, + binIndex: Int, + label: Double, + instanceWeight: Double): Unit = { +val i = nodeIndex * nodeStride + featureOffsets(featureIndex) + binIndex * statsSize +impurityAggregator.update(_allStats, i, label, instanceWeight) --- End diff -- I'll see about improving this, though I may go with just using allStats everywhere. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17957666 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator( } this } +} + +/** + * DecisionTree statistics aggregator. + * This holds a flat array of statistics for a set of (nodes, features, bins) + * and helps with indexing. + * + * This instance of [[DTStatsAggregator]] is used when not subsampling features. + * + * @param numNodes Number of nodes to collect statistics for. + */ +private[tree] class DTStatsAggregatorFixedFeatures( +metadata: DecisionTreeMetadata, +numNodes: Int) extends DTStatsAggregator(metadata) { + + /** + * Offset for each feature for calculating indices into the [[_allStats]] array. + * Mapping: featureIndex -- offset + */ + private val featureOffsets: Array[Int] = { +metadata.numBins.scanLeft(0)((total, nBins) = total + statsSize * nBins) + } + + /** + * Number of elements for each node, corresponding to stride between nodes in [[_allStats]]. + */ + private val nodeStride: Int = featureOffsets.last + + /** + * Total number of elements stored in this aggregator. + */ + def allStatsSize: Int = numNodes * nodeStride + + /** + * Flat array of elements. + * Index for start of stats for a (node, feature, bin) is: + * index = nodeIndex * nodeStride + featureOffsets(featureIndex) + binIndex * statsSize + * Note: For unordered features, the left child stats precede the right child stats + * in the binIndex order. + */ + protected val _allStats: Array[Double] = new Array[Double](allStatsSize) + + /** + * Get flat array of elements stored in this aggregator. + */ + protected def allStats: Array[Double] = _allStats + + /** + * Update the stats for a given (node, feature, bin) for ordered features, using the given label. + */ + def update( + nodeIndex: Int, + featureIndex: Int, + binIndex: Int, + label: Double, + instanceWeight: Double): Unit = { +val i = nodeIndex * nodeStride + featureOffsets(featureIndex) + binIndex * statsSize +impurityAggregator.update(_allStats, i, label, instanceWeight) + } + + /** + * Pre-compute node offset for use with [[nodeUpdate]]. + */ + def getNodeOffset(nodeIndex: Int): Int = nodeIndex * nodeStride + + /** + * Faster version of [[update]]. + * Update the stats for a given (node, feature, bin) for ordered features, using the given label. --- End diff -- I was curious too. I just ran some experiments on EC2. With 1 worker, there is basically no difference. With 16 workers, there is a difference when there are lots of ordered features (where this function nodeUpdate is used): eliminating nodeUpdate and using update makes things run about 5% slower. I will keep it for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3676][Sql]spark sql hive test suite fai...
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/2517 [SPARK-3676][Sql]spark sql hive test suite failed in JDK 1.6 https://issues.apache.org/jira/browse/SPARK-3676 spark sql hive test failed in jdk 1.6, you can replay this by set jdk version = 1.6.0_31 [info] - division *** FAILED *** [info] Results do not match for division: [info] SELECT 2 / 1, 1 / 2, 1 / 3, 1 / COUNT FROM src LIMIT 1 [info] == Parsed Logical Plan == [info] Limit 1 [info] Project (2 / 1) AS c_0#692,(1 / 2) AS c_1#693,(1 / 3) AS c_2#694,(1 / COUNT(1)) AS c_3#695 [info] UnresolvedRelation None, src, None [info] [info] == Analyzed Logical Plan == [info] Limit 1 [info] Aggregate [], [(CAST(2, DoubleType) / CAST(1, DoubleType)) AS c_0#692,(CAST(1, DoubleType) / CAST(2, DoubleType)) AS c_1#693,(CAST(1, DoubleType) / CAST(3, DoubleType)) AS c_2#694,(CAST(CAST(1, LongType), Doub leType) / CAST(COUNT(1), DoubleType)) AS c_3#695] [info] MetastoreRelation default, src, None [info] [info] == Optimized Logical Plan == [info] Limit 1 [info] Aggregate [], 2.0 AS c_0#692,0.5 AS c_1#693,0. AS c_2#694,(1.0 / CAST(COUNT(1), DoubleType)) AS c_3#695 [info] Project [] [info] MetastoreRelation default, src, None [info] [info] == Physical Plan == [info] Limit 1 [info] Aggregate false, [], 2.0 AS c_0#692,0.5 AS c_1#693,0. AS c_2#694,(1.0 / CAST(SUM(PartialCount#699L), DoubleType)) AS c_3#695 [info] Exchange SinglePartition [info] Aggregate true, [], COUNT(1) AS PartialCount#699L [info] HiveTableScan [], (MetastoreRelation default, src, None), None [info] [info] Code Generation: false [info] == RDD == [info] c_0 c_1 c_2 c_3 [info] !== HIVE - 1 row(s) == == CATALYST - 1 row(s) == [info] !2.0 0.5 0. 0.002 2.0 0.5 0. 0.0020 (HiveComparisonTest.scala:370) [info] - timestamp cast #1 *** FAILED *** [info] Results do not match for timestamp cast #1: [info] SELECT CAST(CAST(1 AS TIMESTAMP) AS DOUBLE) FROM src LIMIT 1 [info] == Parsed Logical Plan == [info] Limit 1 [info] Project CAST(CAST(1, TimestampType), DoubleType) AS c_0#995 [info] UnresolvedRelation None, src, None [info] [info] == Analyzed Logical Plan == [info] Limit 1 [info] Project CAST(CAST(1, TimestampType), DoubleType) AS c_0#995 [info] MetastoreRelation default, src, None [info] [info] == Optimized Logical Plan == [info] Limit 1 [info] Project 0.0010 AS c_0#995 [info] MetastoreRelation default, src, None [info] [info] == Physical Plan == [info] Limit 1 [info] Project 0.0010 AS c_0#995 [info] HiveTableScan [], (MetastoreRelation default, src, None), None [info] [info] Code Generation: false [info] == RDD == [info] c_0 [info] !== HIVE - 1 row(s) == == CATALYST - 1 row(s) == [info] !0.001 0.0010 (HiveComparisonTest.scala:370) this is because jdk has different logic to operate ```double```, ```System.out.println(1/500d)``` in different jdk get different result jdk 1.6.0(_31) 0.0020 jdk 1.7.0(_05) 0.002 this lead to HiveQuerySuite failed when generate golden answer in jdk 1.7 and run tests in jdk 1.6, result did not matched You can merge this pull request into a Git repository by running: $ git pull https://github.com/scwf/spark HiveQuerySuite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2517.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2517 commit 1df3964f1ff99aa93ed5f556675fe0d6d0285401 Author: w00228970 wangf...@huawei.com Date: 2014-09-24T06:44:54Z Jdk version leads to different query output for Double, this make HiveQuerySuite failed commit 0cb5e8d6c45f6587497ec854353b96b2d6f536e8 Author: w00228970 wangf...@huawei.com Date: 2014-09-24T06:53:05Z delete golden answer of division-0 and timestamp cast #1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3676][Sql]spark sql hive test suite fai...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2517#issuecomment-56636736 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3663 Document SPARK_LOG_DIR and SPARK_PI...
GitHub user ash211 opened a pull request: https://github.com/apache/spark/pull/2518 SPARK-3663 Document SPARK_LOG_DIR and SPARK_PID_DIR These descriptions are from the header of spark-daemon.sh You can merge this pull request into a Git repository by running: $ git pull https://github.com/ash211/spark SPARK-3663 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2518.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2518 commit af89096fd93a6c85ce5828268ba546fc691f3e3b Author: Andrew Ash and...@andrewash.com Date: 2014-09-24T08:07:21Z SPARK-3663 Document SPARK_LOG_DIR and SPARK_PID_DIR These descriptions are from the header of spark-daemon.sh --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3663 Document SPARK_LOG_DIR and SPARK_PI...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2518#issuecomment-56638326 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20745/consoleFull) for PR 2518 at commit [`af89096`](https://github.com/apache/spark/commit/af89096fd93a6c85ce5828268ba546fc691f3e3b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3676][Sql]spark sql hive test suite fai...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2517#issuecomment-56639208 actually this is a bug in jdk6 http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4428022 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3642. Document the nuances of shared var...
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2490#discussion_r17959346 --- Diff: docs/programming-guide.md --- @@ -1121,6 +1121,11 @@ than shipping a copy of it with tasks. They can be used, for example, to give ev large input dataset in an efficient manner. Spark also attempts to distribute broadcast variables using efficient broadcast algorithms to reduce communication cost. +Spark automatically broadcasts the common data needed by tasks within each stage. The data +broadcasted this way is cached in serialized form and deserialized before running each task. This +means that explicitly creating broadcast variables is only useful when tasks across multiple stages --- End diff -- The concept of stage is mentioned only in the two added paragraphs. Users new to Spark may not know the internals and the execution mmechanism. It would be nice to if some background is introduced here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3642. Document the nuances of shared var...
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2490#discussion_r17959656 --- Diff: docs/programming-guide.md --- @@ -1183,6 +1188,10 @@ running on the cluster can then add to it using the `add` method or the `+=` ope However, they cannot read its value. Only the driver program can read the accumulator's value, using its `value` method. +The same task may run multiple times, either when its output data becomes lost or when multiple --- End diff -- The same task can mean the same task id or the same computation in a stage. Two tasks that have the some computation may have different task id. It would be nice if some backgrounds is introduced here, eg like the relationship between stage and task sets. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3526 Add section about data locality to ...
GitHub user ash211 opened a pull request: https://github.com/apache/spark/pull/2519 SPARK-3526 Add section about data locality to the tuning guide cc @kayousterhout I have a few outstanding questions from compiling this documentation: - What's the difference between NO_PREF and ANY? I understand the implications of the ordering but don't know what an example of each would be - Why is NO_PREF ahead of RACK_LOCAL? I would think it'd be better to schedule rack-local tasks ahead of no preference if you could only do one or the other. Is the idea to wait longer and hope for the rack-local tasks to turn into node-local or better? - Will there be a datacenter-local locality level in the future? Apache Cassandra for example has this level You can merge this pull request into a Git repository by running: $ git pull https://github.com/ash211/spark SPARK-3526 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2519.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2519 commit 20e0e31158fe0350b8f59617f2228a48c34274ef Author: Andrew Ash and...@andrewash.com Date: 2014-09-24T08:50:07Z SPARK-3526 Add section about data locality to the tuning guide --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3526 Add section about data locality to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2519#issuecomment-56642802 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20746/consoleFull) for PR 2519 at commit [`20e0e31`](https://github.com/apache/spark/commit/20e0e31158fe0350b8f59617f2228a48c34274ef). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3663 Document SPARK_LOG_DIR and SPARK_PI...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2518#issuecomment-56645105 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20745/consoleFull) for PR 2518 at commit [`af89096`](https://github.com/apache/spark/commit/af89096fd93a6c85ce5828268ba546fc691f3e3b). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3663 Document SPARK_LOG_DIR and SPARK_PI...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2518#issuecomment-56645112 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20745/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB] topic modeling on Gra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2388#issuecomment-56647000 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20747/consoleFull) for PR 2388 at commit [`7bc691a`](https://github.com/apache/spark/commit/7bc691ab142edba8a127937dfbd836d5738f6527). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3526 Add section about data locality to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2519#issuecomment-56649713 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20746/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3526 Add section about data locality to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2519#issuecomment-56649704 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20746/consoleFull) for PR 2519 at commit [`20e0e31`](https://github.com/apache/spark/commit/20e0e31158fe0350b8f59617f2228a48c34274ef). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/2520 [SPARK-3677] [BUILD] [YARN] Scalastyle is never applyed to the sources under yarn/common You can merge this pull request into a Git repository by running: $ git pull https://github.com/sarutak/spark yarn-scalastyle-modification Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2520.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2520 commit f7f4755252077dd3b79c928d95ac67ee51bbe9e8 Author: Kousuke Saruta saru...@oss.nttdata.co.jp Date: 2014-09-24T10:15:18Z Modified SparkBuild.scala so that scalastyle is applied to the sources under yarn/common Modified style for some sources under yarn/common --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3304] [YARN] ApplicationMaster's Finish...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2198#issuecomment-56650781 @tgravescs Thanks for your notification. I found the issue which causes that scalastyle is not applied to yarn/common. I resolved this issue in #2520 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-56650830 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20748/consoleFull) for PR 2520 at commit [`f7f4755`](https://github.com/apache/spark/commit/f7f4755252077dd3b79c928d95ac67ee51bbe9e8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2508#issuecomment-56651085 @mateiz Got it. On the zip methods, I want to capture the key point from https://issues.apache.org/jira/browse/SPARK-3098 , that the ordering is not only not guaranteed but also may change on reevaluation. I hope that wording is OK to retain and merge into yours. I'll find some place in the programming guide to note this, and remove wording about persist and/or replace with suggestion to sort the RDD. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-56651140 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20748/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-56651137 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20748/consoleFull) for PR 2520 at commit [`f7f4755`](https://github.com/apache/spark/commit/f7f4755252077dd3b79c928d95ac67ee51bbe9e8). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-56651663 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-56652132 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20749/consoleFull) for PR 2520 at commit [`f7f4755`](https://github.com/apache/spark/commit/f7f4755252077dd3b79c928d95ac67ee51bbe9e8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-56652447 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20749/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-56652443 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20749/consoleFull) for PR 2520 at commit [`f7f4755`](https://github.com/apache/spark/commit/f7f4755252077dd3b79c928d95ac67ee51bbe9e8). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3304] [YARN] ApplicationMaster's Finish...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2198#issuecomment-56652661 @tgravescs Sorry, I have something wrong. Please wait a little. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB] topic modeling on Gra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2388#issuecomment-56653261 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20747/consoleFull) for PR 2388 at commit [`7bc691a`](https://github.com/apache/spark/commit/7bc691ab142edba8a127937dfbd836d5738f6527). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class TopicModeling(@transient val docs: RDD[(TopicModeling.DocId, SSV)],` * `class TopicModelingKryoRegistrator extends KryoRegistrator ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB] topic modeling on Gra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2388#issuecomment-56653270 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20747/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2508#issuecomment-56661626 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20750/consoleFull) for PR 2508 at commit [`ad4aeec`](https://github.com/apache/spark/commit/ad4aeec504ad07269511a2aad843a5b815dfcf5d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-5764 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20751/consoleFull) for PR 2520 at commit [`c3e5e6d`](https://github.com/apache/spark/commit/c3e5e6d47f37fc5b40db1050ff100e11cf48bd52). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2508#issuecomment-56669661 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20750/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2508#issuecomment-56669653 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20750/consoleFull) for PR 2508 at commit [`ad4aeec`](https://github.com/apache/spark/commit/ad4aeec504ad07269511a2aad843a5b815dfcf5d). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...
Github user sjbrunst commented on a diff in the pull request: https://github.com/apache/spark/pull/1717#discussion_r17972016 --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterUtils.scala --- @@ -33,15 +33,38 @@ object TwitterUtils { *twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and *twitter4j.oauth.accessTokenSecret * @param filters Set of filter strings to get only those tweets that match them + * @param locations Bounding boxes to get only geotagged tweets within them. Example: +Seq(BoundingBox(-180.0,-90.0,180.0,90.0)) gives any geotagged tweet. If locations and +filters are both nonempty, then any tweet matching either condition may be returned. * @param storageLevel Storage level to use for storing the received objects */ def createStream( ssc: StreamingContext, twitterAuth: Option[Authorization], filters: Seq[String] = Nil, + locations: Seq[BoundingBox] = Nil, --- End diff -- It looks like I'm changing the method here, but this whole method is new. The original one is below. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...
Github user sjbrunst commented on the pull request: https://github.com/apache/spark/pull/1717#issuecomment-56673018 @tdas The current version of TwitterUtils.scala only has new methods. The diff makes it look like I changed the original methods, but they are all there. The original unit tests from the StreamSuites pass, so I don't know why we're still getting the binary compatibility error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-56675917 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20751/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-56675904 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20751/consoleFull) for PR 2520 at commit [`c3e5e6d`](https://github.com/apache/spark/commit/c3e5e6d47f37fc5b40db1050ff100e11cf48bd52). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] Expose JSON representation of dat...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2333#issuecomment-56677764 Thank you for you work @JoshRosen ! I'll check it out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3377] [Metrics] Metrics can be accident...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2432#issuecomment-56678722 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20752/consoleFull) for PR 2432 at commit [`086ee25`](https://github.com/apache/spark/commit/086ee252424f1862998957327ef3c70ff1a5650b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3389] Add Converter for ease of Parquet...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/2256#issuecomment-56679585 Hey - I'm traveling at the moment without laptop access, so will be able to check it out tomorrow evening - hope that's ok :)â Sent from Mailbox On Wed, Sep 24, 2014 at 4:50 AM, Matei Zaharia notificati...@github.com wrote: It looks like we can merge it without a rebase. I'll wait to see whether Nick has any comments because he built this feature. --- Reply to this email directly or view it on GitHub: https://github.com/apache/spark/pull/2256#issuecomment-56618758 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3377] [Metrics] Metrics can be accident...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2432#issuecomment-56690206 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20752/consoleFull) for PR 2432 at commit [`086ee25`](https://github.com/apache/spark/commit/086ee252424f1862998957327ef3c70ff1a5650b). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3377] [Metrics] Metrics can be accident...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2432#issuecomment-56690217 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20752/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3645][SQL] Makes table caching eager by...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2513#issuecomment-56690446 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20753/consoleFull) for PR 2513 at commit [`8d2192d`](https://github.com/apache/spark/commit/8d2192daa3bd2df2c686aa94e46a95dfb0540f08). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/2257#issuecomment-56694970 I'll merge with master and see if I can reproduce the failure... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/2257#issuecomment-56696661 Yep, fails locally too after the merge. Let me look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Python SQL Example Code
GitHub user jyotiska opened a pull request: https://github.com/apache/spark/pull/2521 Python SQL Example Code SQL example code for Python, as shown on [SQL Programming Guide](https://spark.apache.org/docs/1.0.2/sql-programming-guide.html) You can merge this pull request into a Git repository by running: $ git pull https://github.com/jyotiska/spark sql_example Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2521.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2521 commit 8f67b5b9152bbc4ab22de48198fdc2aa6f2fb6ab Author: jyotiska jyotiska...@gmail.com Date: 2014-09-24T16:25:54Z added python sql example commit 0b4614800a852bba709815a393dda0370049901e Author: jyotiska jyotiska...@gmail.com Date: 2014-09-24T16:27:56Z fixed appname for python sql example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Python SQL Example Code
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2521#issuecomment-56698292 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20754/consoleFull) for PR 2521 at commit [`0b46148`](https://github.com/apache/spark/commit/0b4614800a852bba709815a393dda0370049901e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-56698293 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20755/consoleFull) for PR 2520 at commit [`3858089`](https://github.com/apache/spark/commit/3858089fd149ed92a9c27a2308c77f96f1c9a964). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Python SQL Example Code
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2521#issuecomment-56698439 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20754/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Python SQL Example Code
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2521#issuecomment-56698438 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20754/consoleFull) for PR 2521 at commit [`0b46148`](https://github.com/apache/spark/commit/0b4614800a852bba709815a393dda0370049901e). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...
Github user rnowling commented on the pull request: https://github.com/apache/spark/pull/2494#issuecomment-56698570 @mengxr doesn't look like the tests started -- maybe Jenkins ignores comments that address users? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3645][SQL] Makes table caching eager by...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2513#issuecomment-56698729 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20753/consoleFull) for PR 2513 at commit [`8d2192d`](https://github.com/apache/spark/commit/8d2192daa3bd2df2c686aa94e46a95dfb0540f08). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CacheTableCommand(tableName: String, plan: Option[LogicalPlan], isLazy: Boolean)` * `case class UncacheTableCommand(tableName: String) extends Command` * `case class CacheTableCommand(tableName: String, logicalPlan: Option[LogicalPlan], isLazy: Boolean)` * `case class UncacheCommand(tableName: String) extends LeafNode with Command ` * `case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3645][SQL] Makes table caching eager by...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2513#issuecomment-56698738 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20753/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Python SQL Example Code
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2521#issuecomment-56700932 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20756/consoleFull) for PR 2521 at commit [`c90502a`](https://github.com/apache/spark/commit/c90502a62c1114cee15194c1190733e75889d0d1). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Python SQL Example Code
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2521#issuecomment-56700939 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20756/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Python SQL Example Code
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2521#issuecomment-56700691 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20756/consoleFull) for PR 2521 at commit [`c90502a`](https://github.com/apache/spark/commit/c90502a62c1114cee15194c1190733e75889d0d1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Python SQL Example Code
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2521#issuecomment-56703017 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20757/consoleFull) for PR 2521 at commit [`306667e`](https://github.com/apache/spark/commit/306667e1fb905c38c8753520467b95dd27406f70). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2485#issuecomment-56706929 @pwendell @mateiz @andrewor14 can any of you kick jenkins? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2485#issuecomment-56707385 I just kicked it from the `spark-prs` parameterized build trigger; let's wait and see if it starts... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2485#issuecomment-56707584 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/146/consoleFull) for PR 2485 at commit [`f00fa31`](https://github.com/apache/spark/commit/f00fa311945c1eafa8957eae5c84719521761dcd). * This patch **does not** merge cleanly! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2485#issuecomment-56707989 ah sorry, looks like something conflicts now and it needs upmerged. @nishkamravi2 can you please upmerge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Python SQL Example Code
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2521#discussion_r17987196 --- Diff: examples/src/main/python/sql.py --- @@ -0,0 +1,52 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import sys + +from pyspark import SparkContext +from pyspark.sql import SQLContext + + +if __name__ == __main__: +if len(sys.argv) != 2: +print sys.stderr, Usage: sql file +exit(-1) +sc = SparkContext(appName=PythonSQL) +sqlContext = SQLContext(sc) + +# A JSON dataset is pointed to by path. +# The path can be either a single text file or a directory storing text files. +path = examples/src/main/resources/people.json --- End diff -- This assume that this script will be run at SPARK_HOME, it will be broken if user run it at SPARK_HOME/examples/src/main/python --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-56709666 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20755/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-56709659 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20755/consoleFull) for PR 2520 at commit [`3858089`](https://github.com/apache/spark/commit/3858089fd149ed92a9c27a2308c77f96f1c9a964). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Python SQL Example Code
Github user jyotiska commented on a diff in the pull request: https://github.com/apache/spark/pull/2521#discussion_r17987266 --- Diff: examples/src/main/python/sql.py --- @@ -0,0 +1,52 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import sys + +from pyspark import SparkContext +from pyspark.sql import SQLContext + + +if __name__ == __main__: +if len(sys.argv) != 2: +print sys.stderr, Usage: sql file +exit(-1) +sc = SparkContext(appName=PythonSQL) +sqlContext = SQLContext(sc) + +# A JSON dataset is pointed to by path. +# The path can be either a single text file or a directory storing text files. +path = examples/src/main/resources/people.json --- End diff -- In that case, should the JSON file be supplied as codesys.argv[1]/code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/2257#issuecomment-56709844 I found the problem - it was caused by a recent PR that basically broke yarn-cluster mode... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Python SQL Example Code
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2521#issuecomment-56710085 This example only demonstrate jsonFile(), it will more powerful if it could have some usage of `inferSchema()` and `applySchema()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2257#issuecomment-56710658 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20758/consoleFull) for PR 2257 at commit [`6d5b84e`](https://github.com/apache/spark/commit/6d5b84e8b5987683591d8c07b3ff8557d9581871). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2257#discussion_r17987981 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -401,17 +401,17 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments, // it has an uncaught exception thrown out. It needs a shutdown hook to set SUCCEEDED. status = FinalApplicationStatus.SUCCEEDED } catch { - case e: InvocationTargetException = { + case e: InvocationTargetException = e.getCause match { - case _: InterruptedException = { + case _: InterruptedException = // Reporter thread can interrupt to stop user class - } + + case e = throw e --- End diff -- I don't think you need this right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-56711425 LGTM. I don't really understand why you need to tell sbt again where the sources are (after all, sbt does build the yarn code properly), but then I'm not an sbt expert. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2257#issuecomment-56711888 Ah good catch. The latest changes LGTM if you get the tests to pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Python SQL Example Code
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2521#issuecomment-56713322 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20757/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Python SQL Example Code
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2521#issuecomment-56713314 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20757/consoleFull) for PR 2521 at commit [`306667e`](https://github.com/apache/spark/commit/306667e1fb905c38c8753520467b95dd27406f70). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3679] [PySpark] pickle the exact global...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/2522 [SPARK-3679] [PySpark] pickle the exact globals of functions function.func_code.co_names has all the names used in the function, including name of attributes. It will pickle some unnecessary globals if there is a global having the same name with attribute (in co_names). There is a regression introduced by #2114 cc @JoshRosen You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark globals Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2522.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2522 commit dfbccf5c92333da8ab835fc4730aadc844e9f895 Author: Davies Liu davies@gmail.com Date: 2014-09-24T18:23:10Z fix bug while pickle globals of function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Build] Diff from branch point
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2512#issuecomment-56717149 Looks good - thanks Nick. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-56717218 I think `len(rdd)` has the potential to be confused with `rdd.count()`, since calling `len()` on a Python collection usually returns the size of that collection. I also agree that we shouldn't expose Java `Partition` objects to users. Is there any reason to expose `Partition` objects besides allowing `len(rdd.partitions())` to work? If not, I'm not sure that we should add this feature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3659] Set EC2 version to 1.1.0 and upda...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2510#issuecomment-56717292 Looks good, thanks Shivaram. I'll merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3679] [PySpark] pickle the exact global...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2522#issuecomment-56717278 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20759/consoleFull) for PR 2522 at commit [`dfbccf5`](https://github.com/apache/spark/commit/dfbccf5c92333da8ab835fc4730aadc844e9f895). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Build] Diff from branch point
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2512 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: logNormalGraph missing partition parameter
GitHub user elmalto opened a pull request: https://github.com/apache/spark/pull/2523 logNormalGraph missing partition parameter You can merge this pull request into a Git repository by running: $ git pull https://github.com/elmalto/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2523.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2523 commit 5478e716b9f080b7419285752708f0f4050f23da Author: elmalto elma...@users.noreply.github.com Date: 2014-09-24T18:34:45Z logNormalGraph missing partition parameter --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Potential error of message construction of SCC
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2507#issuecomment-56717540 Hey can you create a JIRA issue for this? Also, can you add [GraphX] to the title? Thanks /cc @ankurdave --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3659] Set EC2 version to 1.1.0 and upda...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2510 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: logNormalGraph missing partition parameter
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2523#issuecomment-56717691 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2492#discussion_r17991203 --- Diff: python/pyspark/context.py --- @@ -183,10 +183,9 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, for path in self._conf.get(spark.submit.pyFiles, ).split(,): if path != : (dirname, filename) = os.path.split(path) -self._python_includes.append(filename) -sys.path.append(path) -if dirname not in sys.path: -sys.path.append(dirname) +if filename.lower().endswith(zip) or filename.lower().endswith(egg): --- End diff -- I think that `spark.submit.pyFiles` is allowed to contain `.py` files, too: ``` --py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps. ``` Will this new filtering by `.zip` and `.egg` prevent this from working? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org