[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1780#issuecomment-51157549 QA results for PR 1780:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17927/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1779] Throw an exception if memory frac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/714#issuecomment-51156997 QA tests have started for PR 714. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17932/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1775#issuecomment-51156861 QA results for PR 1775:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17924/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1779] Throw an exception if memory frac...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/714#issuecomment-51156760 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2857] Correct properties to set Master ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1779#issuecomment-51156680 QA results for PR 1779:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17925/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1780#discussion_r15797096 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -47,7 +47,9 @@ class KryoSerializer(conf: SparkConf) with Logging with Serializable { - private val bufferSize = conf.getInt("spark.kryoserializer.buffer.mb", 2) * 1024 * 1024 + private val bufferSize = +(conf.getDouble("spark.kryoserializer.buffer.mb", 0.064) * 1024 * 1024).toInt --- End diff -- maybe add a comment `// 64KB`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2711. Create a ShuffleMemoryManager to t...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1707#issuecomment-51156443 Thanks for the review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2711. Create a ShuffleMemoryManager to t...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1707#issuecomment-51156435 Jenkins actually passed this (see https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17919/consoleFull) but a glitch in the reporting script made it not post here, so going to merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1777#discussion_r15796941 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -1331,4 +1331,49 @@ private[spark] object Utils extends Logging { .map { case (k, v) => s"-D$k=$v" } } + /** + * Attempt to start a service on the given port, or fail after a number of attempts. + * Each subsequent attempt uses 1 + the port used in the previous attempt. + * + * @param startPort The initial port to start the service on. + * @param maxRetries Maximum number of retries to attempt. + * A value of 3 means attempting ports n, n+1, n+2, and n+3, for example. + * @param startService Function to start service on a given port. + * This is expected to throw java.net.BindException on port collision. + * @throws SparkException When unable to start the service after a given number of attempts + */ + def startServiceOnPort[T]( + startPort: Int, + startService: Int => (T, Int), + serviceName: String = "", + maxRetries: Int = 3): (T, Int) = { --- End diff -- sounds good --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1777#discussion_r15796780 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -84,7 +84,8 @@ private[spark] class Executor( // Initialize Spark environment (using system properties read above) private val env = { if (!isLocal) { - val _env = SparkEnv.create(conf, executorId, slaveHostname, 0, + val port = conf.getInt("spark.executor.env.port", 0) // TODO: document this --- End diff -- There's already a `spark.executor.port`, and these two overlap --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1777#issuecomment-51155814 Hey Andrew - overall this looks good. I think ultimately we'll need to just lock down a cluster and test this by opening up ports "one by one", but I think this is worth merging with the current coverage. Some comments about docs mostly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1773#issuecomment-51155708 Alright, merged it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1777#discussion_r15796709 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -1331,4 +1331,49 @@ private[spark] object Utils extends Logging { .map { case (k, v) => s"-D$k=$v" } } + /** + * Attempt to start a service on the given port, or fail after a number of attempts. + * Each subsequent attempt uses 1 + the port used in the previous attempt. + * + * @param startPort The initial port to start the service on. + * @param maxRetries Maximum number of retries to attempt. + * A value of 3 means attempting ports n, n+1, n+2, and n+3, for example. + * @param startService Function to start service on a given port. + * This is expected to throw java.net.BindException on port collision. + * @throws SparkException When unable to start the service after a given number of attempts + */ + def startServiceOnPort[T]( + startPort: Int, + startService: Int => (T, Int), + serviceName: String = "", + maxRetries: Int = 3): (T, Int) = { --- End diff -- That seems reasonable to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1309#issuecomment-51155585 QA results for PR 1309:- This patch PASSES unit tests.For more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17922/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1777#discussion_r15796608 --- Diff: docs/spark-standalone.md --- @@ -311,76 +311,103 @@ configure those ports. Browser -Standalone Cluster Master +Master 8080 Web UI -master.ui.port +master.ui.portSPARK_MASTER_WEBUI_PORT Jetty-based Browser -Driver -4040 +Worker +8081 Web UI -spark.ui.port +worker.ui.portSPARK_WORKER_WEBUI_PORT Jetty-based Browser -History Server -18080 +Application +4040 Web UI -spark.history.ui.port +spark.ui.port Jetty-based Browser -Worker -8081 +History Server +18080 Web UI -worker.ui.port +spark.history.ui.port Jetty-based -Application -Standalone Cluster Master +DriverWorker +Master 7077 -Submit job to cluster -spark.driver.port -Akka-based. Set to "0" to choose a port randomly +Submit job to clusterJoin cluster +SPARK_MASTER_PORT +Akka-based. Set to "0" to choose a port randomly. +Master Worker -Standalone Cluster Master --- End diff -- This overall could use some reorganization. I'd actually move the ones that are not specific to standalone mode to the "Security" page. Also, the new options should be listed in the "Networking" section of `docs/configuration.md`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user ash211 commented on a diff in the pull request: https://github.com/apache/spark/pull/1777#discussion_r15796591 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -1331,4 +1331,49 @@ private[spark] object Utils extends Logging { .map { case (k, v) => s"-D$k=$v" } } + /** + * Attempt to start a service on the given port, or fail after a number of attempts. + * Each subsequent attempt uses 1 + the port used in the previous attempt. + * + * @param startPort The initial port to start the service on. + * @param maxRetries Maximum number of retries to attempt. + * A value of 3 means attempting ports n, n+1, n+2, and n+3, for example. + * @param startService Function to start service on a given port. + * This is expected to throw java.net.BindException on port collision. + * @throws SparkException When unable to start the service after a given number of attempts + */ + def startServiceOnPort[T]( + startPort: Int, + startService: Int => (T, Int), + serviceName: String = "", + maxRetries: Int = 3): (T, Int) = { --- End diff -- Part of me is worried that the maxRetries parameter here is an effective cap on the number of concurrent shells/executors/drivers that can be run on one machine at once when in restricted firewall mode. Because in Standalone mode each app gets its own set of executors across the cluster, this is a cap on the number of concurrent applications on a cluster. What do you think of creating a config option, say spark.ports.maxRetries that can be set to change this? I might change the default to be a bit higher, like 16 or so. The way I'd expect network teams to run this then, is set spark.ports.maxRetries to say 20, and then open up a range of size 20 starting from each of the relevant ports in the config list you pasted in the summary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1777#discussion_r15796416 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -84,7 +84,8 @@ private[spark] class Executor( // Initialize Spark environment (using system properties read above) private val env = { if (!isLocal) { - val _env = SparkEnv.create(conf, executorId, slaveHostname, 0, + val port = conf.getInt("spark.executor.env.port", 0) // TODO: document this --- End diff -- what about just `spark.executor.port` (ala `spark.driver.port`) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1777#discussion_r15796377 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -30,8 +30,9 @@ object DriverWrapper { args.toList match { case workerUrl :: mainClass :: extraArgs => val conf = new SparkConf() +val watcherPort = conf.getInt("spark.worker.watcher.port", 0) // TODO: document this --- End diff -- this is only ever used within one machine, I'm pretty sure these types of ports don't need to be configurable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2503] Lower shuffle output buffer (spar...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1781#issuecomment-51154536 QA tests have started for PR 1781. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17930/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1777#discussion_r15796365 --- Diff: core/src/main/scala/org/apache/spark/deploy/Client.scala --- @@ -146,6 +146,7 @@ object Client { } val conf = new SparkConf() +val port = conf.getInt("spark.standalone.client.port", 0) // TODO: document this --- End diff -- btw - there is a comment below to this effect, but I think there may be an akka option where it doesn't need us to open up a server on the client at all. Not worth spending time on though for this patch... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1777#discussion_r15796327 --- Diff: core/src/main/scala/org/apache/spark/deploy/Client.scala --- @@ -146,6 +146,7 @@ object Client { } val conf = new SparkConf() +val port = conf.getInt("spark.standalone.client.port", 0) // TODO: document this --- End diff -- Are you planning to do this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2857] Correct properties to set Master ...
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/1779#issuecomment-51153773 +1 I've always used SPARK_MASTER_WEBUI_PORT and SPARK_WORKER_WEBUI_PORT in spark-env.sh , I'd imagine everyone else has been also --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2503] Lower shuffle output buffer (spar...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1781 [SPARK-2503] Lower shuffle output buffer (spark.shuffle.file.buffer.kb) to 32KB. This can substantially reduce memory usage during shuffle. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-2503-spark.shuffle.file.buffer.kb Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1781.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1781 commit 1b8f72b387c10d53b54ae3a2e7eb6b9b72f14d36 Author: Reynold Xin Date: 2014-08-05T06:06:02Z [SPARK-2503] Lower shuffle output buffer (spark.shuffle.file.buffer.kb) to 32KB. This can substantially reduce memory usage during shuffle. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/1775#discussion_r15796202 --- Diff: python/pyspark/mllib/classification.py --- @@ -73,11 +73,36 @@ def predict(self, x): class LogisticRegressionWithSGD(object): @classmethod -def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0, initialWeights=None): -"""Train a logistic regression model on the given data.""" +def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0, + initialWeights=None, regParam=1.0, regType=None, intercept=False): +""" +Train a logistic regression model on the given data. + +@param data: The training data. +@param iterations:The number of iterations (default: 100). +@param step: The step parameter used in SGD + (default: 1.0). +@param miniBatchFraction: Fraction of data to be used for each SGD + iteration. +@param initialWeights:The initial weights (default: None). +@param regParam: The regularizer parameter (default: 1.0). +@param regType: The type of regularizer used for training + our model. + Allowed values: "l1" for using L1Updater, + "l2" for using + SquaredL2Updater, + "none" for no regularizer. + (default: "none") +@param intercept: Boolean parameter which indicates the use + or not of the augmented representation for + training data (i.e. whether bias features + are activated or not). +""" sc = data.context +if regType is None: --- End diff -- Ok fair enough (@mengxr I wasn't suggesting enumerations, just a pattern match on the `Option[String]` value as per comment. Don't believe this adds more code or complexity, but no strong feelings either way) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1780#issuecomment-51153633 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2585] Remove special handling of Hadoop...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1648#discussion_r15796222 --- Diff: sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala --- @@ -38,6 +39,7 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter { override def beforeAll() { TestHive.cacheTables = true +TestHive.set(SQLConf.SHUFFLE_PARTITIONS, "2") --- End diff -- We should keep it at 2 to speed up tests ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2857] Correct properties to set Master ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1779#issuecomment-51153508 LGTM pending tests - thanks Andrew. I'm guessing these were simply un-used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1507#discussion_r15796149 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala --- @@ -131,7 +122,9 @@ object BlockFetcherIterator { val networkSize = blockMessage.getData.limit() results.put(new FetchResult(blockId, sizeMap(blockId), () => dataDeserialize(blockId, blockMessage.getData, serializer))) -_remoteBytesRead += networkSize +// TODO: race conditions can occur here with NettyBlockFetcherIterator --- End diff -- Also this comment is pretty vague. It would be good if you could elaborate on it (what you described in the JIRA itself is good enough) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2379] Fix the bug that streaming's rece...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/1694#issuecomment-51153366 Well, I have merged this patch already, in an attempt to squeeze it in 1.1 release. If you open another patch to make the change, I can try squeezing that too. Thanks for detecting and fixing this bug! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1507#discussion_r15796125 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -98,19 +105,22 @@ class TaskMetrics extends Serializable { */ var updatedBlocks: Option[Seq[(BlockId, BlockStatus)]] = None - /** Adds the given ShuffleReadMetrics to any existing shuffle metrics for this task. */ - def updateShuffleReadMetrics(newMetrics: ShuffleReadMetrics) = synchronized { -_shuffleReadMetrics match { - case Some(existingMetrics) => -existingMetrics.shuffleFinishTime = math.max( - existingMetrics.shuffleFinishTime, newMetrics.shuffleFinishTime) -existingMetrics.fetchWaitTime += newMetrics.fetchWaitTime -existingMetrics.localBlocksFetched += newMetrics.localBlocksFetched -existingMetrics.remoteBlocksFetched += newMetrics.remoteBlocksFetched -existingMetrics.remoteBytesRead += newMetrics.remoteBytesRead - case None => -_shuffleReadMetrics = Some(newMetrics) + def createShuffleReadMetricsForDependency(): ShuffleReadMetrics = synchronized { +val readMetrics = new ShuffleReadMetrics() +depsShuffleReadMetrics += readMetrics +readMetrics + } + + def mergeShuffleReadMetrics() = synchronized { --- End diff -- Could you add a brief comment on what this does? (and when this happens) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: DIMSUM: Dimension Independent Matrix Square us...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-51153091 QA tests have started for PR 1778. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17929/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1507#discussion_r15796079 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala --- @@ -191,7 +184,7 @@ object BlockFetcherIterator { } } logInfo("Getting " + _numBlocksToFetch + " non-empty blocks out of " + -(numLocal + numRemote) + " blocks") +totalBlocks + " blocks") --- End diff -- Is this ever used other than for logging? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1780#issuecomment-51152729 QA tests have started for PR 1780. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17927/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1507#discussion_r15796012 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala --- @@ -131,7 +122,9 @@ object BlockFetcherIterator { val networkSize = blockMessage.getData.limit() results.put(new FetchResult(blockId, sizeMap(blockId), () => dataDeserialize(blockId, blockMessage.getData, serializer))) -_remoteBytesRead += networkSize +// TODO: race conditions can occur here with NettyBlockFetcherIterator --- End diff -- Could you add a reference to the JIRA to the comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1780 [SPARK-2856] Decrease initial buffer size for Kryo to 64KB. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark kryo-init-size Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1780.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1780 commit 551b935c3db56cb214ebbea6922cc5b6d37d229a Author: Reynold Xin Date: 2014-08-05T05:52:05Z [SPARK-2856] Decrease initial buffer size for Kryo to 64KB. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: DIMSUM: Dimension Independent Matrix Square us...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-51152551 QA results for PR 1778:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17926/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1777#discussion_r15795950 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -84,7 +84,8 @@ private[spark] class Executor( // Initialize Spark environment (using system properties read above) private val env = { if (!isLocal) { - val _env = SparkEnv.create(conf, executorId, slaveHostname, 0, + val port = conf.getInt("spark.executor.env.port", 0) // TODO: document this --- End diff -- There is probably a better name for this. I just don't know what else to call it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2857] Correct properties to set Master ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1779#issuecomment-51152374 QA tests have started for PR 1779. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17925/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: DIMSUM: Dimension Independent Matrix Square us...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-51152380 QA tests have started for PR 1778. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17926/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1773#issuecomment-51152366 I see, that makes sense. LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1779] Throw an exception if memory frac...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/714#issuecomment-51152316 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...
Github user miccagiann commented on a diff in the pull request: https://github.com/apache/spark/pull/1775#discussion_r15795902 --- Diff: python/pyspark/mllib/classification.py --- @@ -73,11 +73,36 @@ def predict(self, x): class LogisticRegressionWithSGD(object): @classmethod -def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0, initialWeights=None): -"""Train a logistic regression model on the given data.""" +def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0, + initialWeights=None, regParam=1.0, regType=None, intercept=False): +""" +Train a logistic regression model on the given data. + +@param data: The training data. +@param iterations:The number of iterations (default: 100). +@param step: The step parameter used in SGD + (default: 1.0). +@param miniBatchFraction: Fraction of data to be used for each SGD + iteration. +@param initialWeights:The initial weights (default: None). +@param regParam: The regularizer parameter (default: 1.0). +@param regType: The type of regularizer used for training + our model. + Allowed values: "l1" for using L1Updater, + "l2" for using + SquaredL2Updater, + "none" for no regularizer. + (default: "none") +@param intercept: Boolean parameter which indicates the use + or not of the augmented representation for + training data (i.e. whether bias features + are activated or not). +""" sc = data.context +if regType is None: --- End diff -- Xiangrui suggested to keep Scala code as simple as possible and only to throw the `IllegalArgumentException` from there. I tried with pattern matching and by creating enumerations however the result was complicated and I ended up adding more classes to the scala and to python code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2857] Correct properties to set Master ...
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/1779 [SPARK-2857] Correct properties to set Master / Worker ports `master.ui.port` and `worker.ui.port` were never picked up by SparkConf, simply because they are not prefixed with "spark." Unfortunately, this is also currently the documented way of setting these values. You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewor14/spark master-worker-port Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1779.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1779 commit 4db3d5d30b8a3ae6d186a4595ff6d52b39590200 Author: Andrew Or Date: 2014-08-05T05:44:17Z Stop using configs that don't actually work commit 8475e95ea2724918fc29c132384c2c4723acf0c4 Author: Andrew Or Date: 2014-08-05T05:44:32Z Update docs to reflect changes in configs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: DIMSUM: Dimension Independent Matrix Square us...
GitHub user rezazadeh opened a pull request: https://github.com/apache/spark/pull/1778 DIMSUM: Dimension Independent Matrix Square using Mapreduce # DIMSUM Compute all pairs of similar vectors using brute force approach, and also DIMSUM sampling approach. Laying down some notation: we are looking for all pairs of similar columns in an m x n matrix whose entries are denoted a_ij, with the iâth row denoted r_i and the jâth column denoted c_j. There is an oversampling parameter labeled É£ that should be set to 4 log(n)/s to get provably correct results (with high probability), where s is the similarity threshold. The algorithm is stated with a Map and Reduce, with proofs of correctness and efficiency in published papers [1] [2]. The reducer is simply the summation reducer. The mapper is more interesting, and is also the heart of the scheme. As an exercise, you should try to see why in expectation, the map-reduce below outputs cosine similarities. ![dimsumv2](https://cloud.githubusercontent.com/assets/3220351/3807272/d1d9514e-1c62-11e4-9f12-3cfdb1d78b3a.png) [1] Bosagh-Zadeh, Reza and Carlsson, Gunnar (2013), Dimension Independent Matrix Square using MapReduce, arXiv:1304.1467 [2] Bosagh-Zadeh, Reza and Goel, Ashish (2012), Dimension Independent Similarity Computation, arXiv:1206.2082 # Testing Tests for all invocations included. Added magnitude computation to MultivariateStatisticalSummary since it was needed. Added a test for this. Scaling it up now and will report back with results. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rezazadeh/spark dimsumv2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1778.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1778 commit 5b8cd7deb3f29d3c2533b01f496f41175471f023 Author: Reza Zadeh Date: 2014-08-04T02:19:45Z Initial files commit 6bebabb9364eb917dd86acbea4438a9e4d301f18 Author: Reza Zadeh Date: 2014-08-04T18:37:31Z remove changes to MatrixSuite commit 3726ca97ab184a8d5a9b3c0003d3afa6fd973890 Author: Reza Zadeh Date: 2014-08-04T20:47:57Z Remove MatrixAlgebra commit 654c4fb1136cfa856fc354b5ddb710758d38948f Author: Reza Zadeh Date: 2014-08-04T21:38:18Z default methods commit 502ce526fc8ec84fd2c1f3b2b9a74b07e76c2d65 Author: Reza Zadeh Date: 2014-08-04T22:02:36Z new interface commit 05e59b8e883fd126dc81707b90aaf1011a2d1ee5 Author: Reza Zadeh Date: 2014-08-04T22:59:55Z Add test commit 75edb257e33a23f87fa379be597483d12a421626 Author: Reza Zadeh Date: 2014-08-05T01:02:33Z All tests passing! commit 029aa9c3d71960cb63293d721b96eebb6bdfcfbf Author: Reza Zadeh Date: 2014-08-05T05:12:40Z javadoc and new test commit 139c8e1d20274322dfe1c513d6872e47f5eb5138 Author: Reza Zadeh Date: 2014-08-05T05:16:23Z Syntax changes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1775#issuecomment-51152156 QA tests have started for PR 1775. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17924/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1777#issuecomment-51152157 QA tests have started for PR 1777. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17923/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-51152075 @javadba Thanks for detail. Let me replay the sequence. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/1775#discussion_r15795801 --- Diff: python/pyspark/mllib/classification.py --- @@ -73,11 +73,36 @@ def predict(self, x): class LogisticRegressionWithSGD(object): @classmethod -def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0, initialWeights=None): -"""Train a logistic regression model on the given data.""" +def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0, + initialWeights=None, regParam=1.0, regType=None, intercept=False): +""" +Train a logistic regression model on the given data. + +@param data: The training data. +@param iterations:The number of iterations (default: 100). +@param step: The step parameter used in SGD + (default: 1.0). +@param miniBatchFraction: Fraction of data to be used for each SGD + iteration. +@param initialWeights:The initial weights (default: None). +@param regParam: The regularizer parameter (default: 1.0). +@param regType: The type of regularizer used for training + our model. + Allowed values: "l1" for using L1Updater, + "l2" for using + SquaredL2Updater, + "none" for no regularizer. + (default: "none") +@param intercept: Boolean parameter which indicates the use + or not of the augmented representation for + training data (i.e. whether bias features + are activated or not). +""" sc = data.context +if regType is None: --- End diff -- As per above comment, you can just pass `regType` straight through if you then wrap the null in `Option` on the Scala/Java side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...
Github user miccagiann commented on a diff in the pull request: https://github.com/apache/spark/pull/1775#discussion_r15795787 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -341,16 +341,26 @@ class PythonMLLibAPI extends Serializable { stepSize: Double, regParam: Double, miniBatchFraction: Double, - initialWeightsBA: Array[Byte]): java.util.List[java.lang.Object] = { + initialWeightsBA: Array[Byte], + regType: String, + intercept: Boolean): java.util.List[java.lang.Object] = { +val SVMAlg = new SVMWithSGD() +SVMAlg.setIntercept(intercept) +SVMAlg.optimizer + .setNumIterations(numIterations) + .setRegParam(regParam) + .setStepSize(stepSize) --- End diff -- Thanks! I am fixing it right now! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/1777 [SPARK-2157] Enable tight firewall rules for Spark The goal of this PR is to allow users of Spark to write tight firewall rules for their clusters. This is currently not possible because Spark uses random ports in many places, notably the communication between executors and drivers. The changes in this PR are based on top of @ash211's changes in #1107. The list covered here may or may not be the complete set of port needed for Spark to operate perfectly. However, as of the latest commit there are no known sources of random ports (except in tests). I have not documented a few of the more obscure configs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewor14/spark configure-ports Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1777.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1777 commit 1c0981a3f93c3bbb27425d76342c98c8c7d469cf Author: Andrew Ash Date: 2014-06-17T05:09:59Z Make port in HttpServer configurable commit 49ee29b49e1275b48d18aef5182dba2937c11358 Author: Andrew Ash Date: 2014-06-17T05:31:10Z SPARK-1174 Add port configuration for HttpFileServer Uses spark.fileserver.port commit f34115d59b83163d9542be09eb0c89a87ea89309 Author: Andrew Ash Date: 2014-06-17T05:31:52Z SPARK-1176 Add port configuration for HttpBroadcast Uses spark.broadcast.port commit 17c79bbd66708d24c093be5f43e60c61f504d19d Author: Andrew Ash Date: 2014-06-17T06:00:19Z Add a configuration option for spark-shell's class server spark.replClassServer.port commit b80d2fd8e9b27a4d49561d31f100ffbb75393685 Author: Andrew Ash Date: 2014-06-17T06:40:32Z Make Spark's block manager port configurable spark.blockManager.port commit c5a05684ace9332077dbf63848d08f39a8b91628 Author: Andrew Ash Date: 2014-06-17T08:10:21Z Fix ConnectionManager to retry with increment Fails when running master+worker+executor+shell on the same machine. I think the issue is that both the shell and the executor attempt to start a ConnectionManager, which causes port conflicts. Solution is to attempt and increment on BindExceptions commit cad16dacb1b7dbac1122b38c2b02fe35f1303a59 Author: Andrew Ash Date: 2014-06-17T16:45:59Z Add fallover increment logic for HttpServer commit 066dc7ac936cfbf268e6ca7adfa1388f5c4049d6 Author: Andrew Ash Date: 2014-06-17T17:08:49Z Fix up HttpServer port increments commit 5d84e0e9285aec53aa9c57d64313c0e513e41d30 Author: Andrew Ash Date: 2014-06-17T17:43:33Z Document new port configuration options - spark.fileserver.port - spark.broadcast.port - spark.replClassServer.port - spark.blockManager.port commit 9e4ad9628f7ff0f96a3881a1a5aaedcb8be6b80d Author: Andrew Ash Date: 2014-06-17T18:14:08Z Reformat for style checker commit 24a4c327c7441e6af6b82dbddacd71c57384dc04 Author: Andrew Ash Date: 2014-06-30T00:25:44Z Remove type on val to match surrounding style commit 0347aef2b686d1bcc1b8f5c230ba8ff99cbd0691 Author: Andrew Ash Date: 2014-06-30T05:26:48Z Unify port fallback logic to a single place commit 7c5bdc44df32fb550f375de3518b628fbb360d20 Author: Andrew Ash Date: 2014-06-30T05:34:47Z Fix style issue commit 038a579a26ffcfc1c5540f28176f236779eef12a Author: Andrew Ash Date: 2014-06-30T07:02:17Z Trust the server start function to report the port the service started on commit ec676f4f74b7a8402047fb849b9dca7172cd32f5 Author: Andrew Or Date: 2014-08-04T21:46:50Z Merge branch 'SPARK-2157' of github.com:ash211/spark into configure-ports commit 73fbe892794a6f7e4a051401f356c89f4aa7f81f Author: Andrew Or Date: 2014-08-04T22:39:01Z Move start service logic to Utils commit 6b550b0681ae8c0394685f6e929c4a14a48d10ec Author: Andrew Or Date: 2014-08-04T23:56:17Z Assorted fixes commit ba322807d2e5ed1ce69dae449238a1df16a74ae9 Author: Andrew Or Date: 2014-08-05T00:00:31Z Minor fixes commit 1d7e40813e6ae98ee5cffb3e9e61807f3a01e941 Author: Andrew Or Date: 2014-08-05T00:40:27Z Treat 0 ports specially + return correct ConnectionManager port commit 470f38cf3c54941fbbcc358a358cc8a1fe2d6edd Author: Andrew Or Date: 2014-08-05T00:43:24Z Special case non-"Address already in use" exceptions commit e111d080b4a7c0103c30b3a6e29c058d0ac4c3d0 Author: Andrew Or Date: 2014-08-05T01:46:11Z Add names for UI services commit 3f8e51bbb82669b43d7d52ece09ac957b35e994e Author: Andrew Or Date: 2014-08-05T01:46:29Z Correct erroneous docs... commit 4d9e6f348cc408064173a91ecf9b509804eadf01 Author: Andrew Or Date: 2014-08-05T02:32:31Z Fix super subtle bug We were p
[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/1775#discussion_r15795778 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -341,16 +341,26 @@ class PythonMLLibAPI extends Serializable { stepSize: Double, regParam: Double, miniBatchFraction: Double, - initialWeightsBA: Array[Byte]): java.util.List[java.lang.Object] = { + initialWeightsBA: Array[Byte], + regType: String, + intercept: Boolean): java.util.List[java.lang.Object] = { +val SVMAlg = new SVMWithSGD() +SVMAlg.setIntercept(intercept) +SVMAlg.optimizer --- End diff -- Also maybe prefer to do the pattern matching on `regType` before this, and do something like: ``` val updater = Option(regType) match { ... } optimizer .setUpdater(updater) .setNumIterations ... } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/1775#discussion_r15795743 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -341,16 +341,26 @@ class PythonMLLibAPI extends Serializable { stepSize: Double, regParam: Double, miniBatchFraction: Double, - initialWeightsBA: Array[Byte]): java.util.List[java.lang.Object] = { + initialWeightsBA: Array[Byte], + regType: String, + intercept: Boolean): java.util.List[java.lang.Object] = { +val SVMAlg = new SVMWithSGD() +SVMAlg.setIntercept(intercept) +SVMAlg.optimizer + .setNumIterations(numIterations) + .setRegParam(regParam) + .setStepSize(stepSize) +if (regType == "l2") { --- End diff -- Py4j will pass through Python `None` as null (at least it should, if I recall), so on the Java side you can wrap that in an `Option` instead of making it "none". So you could do: ``` Option(regType) match { case Some("l1") => ... case Some("l2") => ... case None => ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/1775#discussion_r15795654 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -363,15 +373,27 @@ class PythonMLLibAPI extends Serializable { numIterations: Int, stepSize: Double, miniBatchFraction: Double, - initialWeightsBA: Array[Byte]): java.util.List[java.lang.Object] = { + initialWeightsBA: Array[Byte], + regParam: Double, + regType: String, + intercept: Boolean): java.util.List[java.lang.Object] = { +val LogRegAlg = new LogisticRegressionWithSGD() +LogRegAlg.setIntercept(intercept) +LogRegAlg.optimizer + .setNumIterations(numIterations) + .setRegParam(regParam) + .setStepSize(stepSize) --- End diff -- miniBatchFraction missing here too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/1775#discussion_r15795634 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -341,16 +341,26 @@ class PythonMLLibAPI extends Serializable { stepSize: Double, regParam: Double, miniBatchFraction: Double, - initialWeightsBA: Array[Byte]): java.util.List[java.lang.Object] = { + initialWeightsBA: Array[Byte], + regType: String, + intercept: Boolean): java.util.List[java.lang.Object] = { +val SVMAlg = new SVMWithSGD() +SVMAlg.setIntercept(intercept) +SVMAlg.optimizer + .setNumIterations(numIterations) + .setRegParam(regParam) + .setStepSize(stepSize) --- End diff -- You forgot to set miniBatchFraction here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1309#issuecomment-51151466 QA tests have started for PR 1309. This patch DID NOT merge cleanly! View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17922/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51151464 QA tests have started for PR 1481. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17921/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2583] ConnectionManager error reporting
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1758#issuecomment-51151458 QA tests have started for PR 1758. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17920/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-51151346 It's too late to get into 1.1, but I'll try to make it happen in 1.2. We'll use this at Alpine implementation first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2583] ConnectionManager error reporting
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1758#issuecomment-51151333 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51151290 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-51151194 This looks promising. FWIW, I support decoupling regularization from the raw gradient update and believe it is a good way to go - it will allow various update/learning rate schemes (adagrad, normalized adaptive gradient, etc) to be applied independent of the regularization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2711. Create a ShuffleMemoryManager to t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1707#issuecomment-51150303 QA tests have started for PR 1707. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17919/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2711. Create a ShuffleMemoryManager to t...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1707#issuecomment-51150135 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1775#issuecomment-5114 QA results for PR 1775:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17918/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1773#issuecomment-51149992 QA results for PR 1773:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17917/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...
Github user javadba commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-51149151 @ueshin I have git clone'd to a completely new area, and I reverted my last commit. git clone https://github.com/javadba/spark.git strlen2 cd strlen2 git chckout strlen git revert 22eddbce6a201c8f5b5c31859ceb972e60657377 mvn -DskipTests -Pyarn -Phive -Phadoop-2.3 clean compile package mvn -Pyarn -Phive -Phadoop-2.3 test -DwildcardSuites=org.apache.spark.sql.hive.execution.HiveQuerySuite,org.apache.spark.sql.SQLQuerySuite,org.apache.spark.sql.catalyst.expressions.ExpressionEvaluationSuite I get precisely the same error: HiveQuerySuite: 21:03:31.120 WARN org.apache.spark.util.Utils: Your hostname, mithril resolves to a loopback address: 127.0.1.1; using 10.0.0.33 instead (on interface eth0) 21:03:31.121 WARN org.apache.spark.util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address 21:03:37.294 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 21:03:40.045 WARN com.jolbox.bonecp.BoneCPConfig: Max Connections < 1. Setting to 20 21:03:49.464 WARN com.jolbox.bonecp.BoneCPConfig: Max Connections < 1. Setting to 20 21:03:49.487 WARN org.apache.hadoop.hive.metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.12.0 21:03:57.157 WARN com.jolbox.bonecp.BoneCPConfig: Max Connections < 1. Setting to 20 21:03:57.593 WARN com.jolbox.bonecp.BoneCPConfig: Max Connections < 1. Setting to 20 - single case - double case - case else null - having no references - boolean = number - CREATE TABLE AS runs once - between - div - division *** RUN ABORTED *** java.lang.StackOverflowError: at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) Now, let's revert the revert : git log commit db09cd132c2d7e995287eea54f3415726934138c Author: Stephen Boesch Date: Mon Aug 4 20:54:24 2014 -0700 Revert "Use Octet/Char_Len instead of Octet/Char_length due to apparent preexisting spark ParserCombinator bug." This reverts commit 22eddbce6a201c8f5b5c31859ceb972e60657377. git revert db09cd132c2d7e995287eea54f3415726934138c mvn -Pyarn -Phive -Phadoop-2.3 test -DwildcardSuites=org.apache.spark.sql.hive.execution.HiveQuerySuite,org.apache.spark.sql.SQLQuerySuite,org.apache.spark.sql.catalyst.expressions.ExpressionEvaluationSuite Now those three test sutes pass again (specifically HiveQuerySuite did not fail) And .. just to be *extra* sure here- that we can toggle between pass/fail arbitrary # of times: commit 602adedc9ca58d99957eb12bd91098ffe904604c Author: Stephen Boesch Date: Mon Aug 4 21:18:53 2014 -0700 Revert "Revert "Use Octet/Char_Len instead of Octet/Char_length due to apparent preexisting spark ParserCombinator bug."" git revert 602adedc9ca58d99957eb12bd91098ffe904604c And once again HiveQuerySuite fails with the same error. So I have established clearly the following: the strlen branch on my fork fails with SOF if we rollback the commit that changes OCTET/CHAR_LENGTH -> OCTET/CHAR_LEN. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.o
[GitHub] spark pull request: [SPARK-2323] Exception in accumulator update s...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1772 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1779] Throw an exception if memory frac...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/714#issuecomment-51147893 hi @andrewor14ï¼unit tests error in sparkstreamingï¼we may retest this [info] - flume polling test multiple hosts *** FAILED *** [info] org.jboss.netty.channel.ChannelException: Failed to bind to: localhost/127.0.0.1:56218 [info] at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user li-zhihui commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-51147815 Thaks @JoshRosen sorry I missed the important operation (and I missed FileUtil.chmod(targetFile.getAbsolutePath, "a+x") too). I add a new commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2379] Fix the bug that streaming's rece...
Github user joyyoj commented on the pull request: https://github.com/apache/spark/pull/1694#issuecomment-51147129 @tdas Thanks for reminding me to add reportError, i didn't notice your reply before. It's a good idea to add it in this patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2585] Remove special handling of Hadoop...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/1648#discussion_r15794174 --- Diff: sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala --- @@ -38,6 +39,7 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter { override def beforeAll() { TestHive.cacheTables = true +TestHive.set(SQLConf.SHUFFLE_PARTITIONS, "2") --- End diff -- Just a note: we need to remove this setting before merging it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1775#issuecomment-51146931 QA tests have started for PR 1775. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17918/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1773#issuecomment-51146932 QA tests have started for PR 1773. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17917/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...
Github user miccagiann commented on the pull request: https://github.com/apache/spark/pull/1775#issuecomment-51146861 Found the error. It was a typo. Let's see what Jenkins is going to say... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1986][GraphX]move lib.Analytics to org....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1766#issuecomment-51146856 QA results for PR 1766:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17914/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark 2381] stop the streaming application if...
Github user joyyoj commented on the pull request: https://github.com/apache/spark/pull/1693#issuecomment-51146846 Ok, I'll do it soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-51146817 QA results for PR 1616:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17915/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1715#issuecomment-51146747 Sure, that sounds good. The only thing if you use the first entry in `--jar` is I wouldn't automatically look for a main class in that (there's this part that checks the JAR manifest). Instead, make them use `--main-class` in that case. Otherwise it's a bit confusing that you give only JARs and it starts running some program. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1773#issuecomment-51146679 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1773#issuecomment-51146519 QA results for PR 1773:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17916/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2585] Remove special handling of Hadoop...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1648#issuecomment-51146461 QA results for PR 1648:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17906/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...
Github user javadba commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-51146419 @ueshinI repeatably verified that simply changing "OCTET_LEN" to "OCTET_LENGTH" ended up causing SOF. By "repeatably" I mean: Set the 'constant' val OCTET_LENGTH="OCTET_LENGTH" observe the error change to something like val OCTET_LENGTH="OCTET_LEN" or val OCTET_LENGTH="OCTET_LENG" observe the error has gone away Rinse, cleanse, repeat i have been able to demonstrate this multiple times. Now the regression tests have been run against the modified and reliable code. Please re-run your tests in a fresh area. I will do the same .. but i am hesitant to consider to revert because we have positive test results now with the latest commit (as well as my results of the problem before the commit). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2323] Exception in accumulator update s...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1772#issuecomment-51146442 Actually I merged it in master, branch-1.0. and branch-1.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2323] Exception in accumulator update s...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1772#issuecomment-51146388 Merging this in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-51146059 finally, --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1715#issuecomment-51145806 Thanks Matei, Patrick's last a few comments have already convinced me to remove the "primary" notion from a user's perspective. And yes, `spark-internal` can be removed in this way, `spark-shell` and `pyspark-shell` can also be removed by checking `--class` in Spark scripts. Internally, to keep code relies on `primaryResource` intact (one example is the cluster deploy mode in a standalone cluster), we can still pick the first entry in `--jar` for Java/Scala apps and `--main-file` for Python apps as primary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-51145793 Hi @javadba, I tested `org.apache.spark.sql.SQLQuerySuite` and `org.apache.spark.sql.hive.execution.HiveQuerySuite` locally, and they worked fine even if I reverted the last commit 22eddbc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1775#issuecomment-51145763 QA results for PR 1775:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17912/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1022][Streaming] Add Kafka real unit te...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1751#issuecomment-51145617 QA results for PR 1751:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17911/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2323] Exception in accumulator update s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1772#issuecomment-51145063 QA results for PR 1772:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17910/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1779] add warning when memoryFraction i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/714#issuecomment-51144509 QA results for PR 714:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17909/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1309#issuecomment-51144330 QA results for PR 1309:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):class Accumulator[T](@transient initialValue: T, param: AccumulatorParam[T], name: Option[String])class AccumulableInfo (For more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17908/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-51143952 QA results for PR 1313:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17907/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-51143772 QA results for PR 1586:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):case class Length(child: Expression) extends UnaryExpression {case class OctetLength(child: Expression, encoding : Expression) extends UnaryExpressionFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17902/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1773#issuecomment-51143734 (Basically before, in the common case of one key per hash code, we allocated a whole new ArrayBuffer for each key, which is 2 Java objects and probably around 100 bytes.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2813: [SQL] Implement SQRT() directly in...
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1750#issuecomment-51143752 @marmbrus I'll file a JIRA for that and am happy to put it at the front of my plate; sounds like a fun problem! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1773#issuecomment-51143730 QA tests have started for PR 1773. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17916/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1773#issuecomment-51143669 @andrewor14 alright, I've pushed a new commit that updates the comments. I also made it reuse ArrayBuffers, which should avoid quite a bit of young gen GC. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1986][GraphX]move lib.Analytics to org....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1766#issuecomment-51143483 QA tests have started for PR 1766. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17914/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51143491 I don't see how these test failures might be related to the changes introduced in this PR. I see that the issue @JoshRosen called out earlier here has been [resolved](https://github.com/apache/spark/pull/1771), so that can't be it. More confusingly, the report ends with this: ``` [info] All tests passed. [info] Passed: Total 797, Failed 0, Errors 0, Passed 797, Ignored 7 [error] (streaming-flume/test:test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 3157 s, completed Aug 4, 2014 7:30:52 PM ``` @rxin - Any pointers? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org