[GitHub] spark pull request: [SPARK-9437][core] avoid overflow in SizeEstim...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7750#issuecomment-126308741 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9437][core] avoid overflow in SizeEstim...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7750#issuecomment-126308704 [Test build #162 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/162/console) for PR 7750 at commit [`29493f1`](https://github.com/apache/spark/commit/29493f12720dd9f02e8f199046f98f7a548756ea). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35861720 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml; + +import java.util.regex.Pattern; + +import org.apache.spark.SparkConf; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.api.java.function.Function; +import org.apache.spark.ml.clustering.KMeansModel; +import org.apache.spark.ml.clustering.KMeans; +import org.apache.spark.mllib.linalg.Vector; +import org.apache.spark.mllib.linalg.VectorUDT; +import org.apache.spark.mllib.linalg.Vectors; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.catalyst.expressions.GenericRow; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +/** + * An example demonstrating a k-means clustering. + * Run with + * pre + * bin/run-example ml.JavaSimpleParamsExample file k + * /pre + */ +public class JavaKMeansExample { + + private static class ParsePoint implements FunctionString, Row { +final private static Pattern separator = Pattern.compile( ); --- End diff -- This is picking nits, and something we can fix on merge, but the normal order of modifiers is `private static final ...` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][CORE][WIP] expose Netty network l...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/7753#issuecomment-126283119 @jerryshao I'm not entirely sure I know what you mean by: | A simple question, is it enough to only expose the maximum memory usage of Netty layer? can you elaborate? Obviously we'd always like more metrics, but are you saying this isn't useful? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-126282672 I think this is pretty fine, minus one thing I can fix on merge. Any more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8979] Add a PID based rate estimator
Github user dragos commented on a diff in the pull request: https://github.com/apache/spark/pull/7648#discussion_r35862416 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/rate/PIDRateEstimator.scala --- @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.scheduler.rate + +/** + * Implements a proportional-integral-derivative (PID) controller which acts on + * the speed of ingestion of elements into Spark Streaming. A PID controller works + * by calculating an '''error''' between a measured output and a desired value. In the + * case of Spark Streaming the error is the difference between the measured processing + * rate (number of elements/processing delay) and the previous rate. + * + * @see https://en.wikipedia.org/wiki/PID_controller + * + * @param batchDurationMillis the batch duration, in milliseconds + * @param proportional how much the correction should depend on the current + *error. This term usually provides the bulk of correction. A value too large would + *make the controller overshoot the setpoint, while a small value would make the + *controller too insensitive. The default value is -1. + * @param integral how much the correction should depend on the accumulation + *of past errors. This term accelerates the movement towards the setpoint, but a large + *value may lead to overshooting. The default value is -0.2. + * @param derivative how much the correction should depend on a prediction + *of future errors, based on current rate of change. This term is not used very often, + *as it impacts stability of the system. The default value is 0. + */ +private[streaming] class PIDRateEstimator( +batchIntervalMillis: Long, +proportional: Double = -1D, +integral: Double = -.2D, +derivative: Double = 0D) + extends RateEstimator { + + private var firstRun: Boolean = true + private var latestTime: Long = -1L + private var latestRate: Double = -1D + private var latestError: Double = -1L + + require( +batchIntervalMillis 0, +sSpecified batch interval $batchIntervalMillis in PIDRateEstimator is invalid.) + + def compute(time: Long, // in milliseconds + elements: Long, + processingDelay: Long, // in milliseconds + schedulingDelay: Long // in milliseconds +): Option[Double] = { + +this.synchronized { + if (time latestTime processingDelay 0 batchIntervalMillis 0) { + +// in seconds, should be close to batchDuration +val delaySinceUpdate = (time - latestTime).toDouble / 1000 + +// in elements/second +val processingRate = elements.toDouble / processingDelay * 1000 + +// in elements/second +val error = latestRate - processingRate --- End diff -- Here I'd prefer to keep this as `error`, as I think most people reading this code would have more troubles mapping things to PID terminology than to Spark Streaming terminology, and all PID docs will mention error and correction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9308] [ML] ml.NaiveBayesModel support p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7672#issuecomment-126302698 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8564][Streaming]Add the Python API for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6955#issuecomment-126310637 [Test build #39038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39038/console) for PR 6955 at commit [`455f7ea`](https://github.com/apache/spark/commit/455f7ea47cd6bca3047b8023bab8ff0ed944c13e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` public static final class FloatPrefixComparator extends PrefixComparator ` * `class KinesisUtils(object):` * `class InitialPositionInStream(object):` * `case class UnsafeExternalSort(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8978][Streaming] Implements the DirectK...
GitHub user dragos opened a pull request: https://github.com/apache/spark/pull/7796 [SPARK-8978][Streaming] Implements the DirectKafkaController You can merge this pull request into a Git repository by running: $ git pull https://github.com/typesafehub/spark topic/streaming-bp/kafka-direct Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7796.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7796 commit f788b9b0dde3981982a118ec3d3bed42b89843f0 Author: François Garillot franc...@garillot.net Date: 2015-07-14T14:53:03Z [SPARK-8978][Streaming] Implements the DirectKafkaController --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8979] Add a PID based rate estimator
Github user dragos commented on a diff in the pull request: https://github.com/apache/spark/pull/7648#discussion_r35862291 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/rate/PIDRateEstimator.scala --- @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.scheduler.rate + +/** + * Implements a proportional-integral-derivative (PID) controller which acts on + * the speed of ingestion of elements into Spark Streaming. A PID controller works + * by calculating an '''error''' between a measured output and a desired value. In the + * case of Spark Streaming the error is the difference between the measured processing + * rate (number of elements/processing delay) and the previous rate. + * + * @see https://en.wikipedia.org/wiki/PID_controller + * + * @param batchDurationMillis the batch duration, in milliseconds + * @param proportional how much the correction should depend on the current + *error. This term usually provides the bulk of correction. A value too large would + *make the controller overshoot the setpoint, while a small value would make the + *controller too insensitive. The default value is -1. + * @param integral how much the correction should depend on the accumulation + *of past errors. This term accelerates the movement towards the setpoint, but a large + *value may lead to overshooting. The default value is -0.2. + * @param derivative how much the correction should depend on a prediction + *of future errors, based on current rate of change. This term is not used very often, + *as it impacts stability of the system. The default value is 0. + */ +private[streaming] class PIDRateEstimator( +batchIntervalMillis: Long, +proportional: Double = -1D, +integral: Double = -.2D, +derivative: Double = 0D) + extends RateEstimator { + + private var firstRun: Boolean = true + private var latestTime: Long = -1L + private var latestRate: Double = -1D + private var latestError: Double = -1L + + require( +batchIntervalMillis 0, +sSpecified batch interval $batchIntervalMillis in PIDRateEstimator is invalid.) + + def compute(time: Long, // in milliseconds + elements: Long, + processingDelay: Long, // in milliseconds + schedulingDelay: Long // in milliseconds +): Option[Double] = { + +this.synchronized { + if (time latestTime processingDelay 0 batchIntervalMillis 0) { + +// in seconds, should be close to batchDuration +val delaySinceUpdate = (time - latestTime).toDouble / 1000 + +// in elements/second +val processingRate = elements.toDouble / processingDelay * 1000 + +// in elements/second +val error = latestRate - processingRate + +// in elements/second +val sumError = schedulingDelay.toDouble * processingRate / batchIntervalMillis --- End diff -- Carrying over conversation from previous thread that got lost due to rebase Its hard to understand what sumError mean in terms of the rates and all? Can you write down the physical interpretation of this sumError? And also make the name better accordingly? cc @huitseeker @tdas tdas added a note 14 hours ago So I am trying to understand this. (scheduling delay / batch interval) = approx the number of batches the system is delayed. Lets call it numDelayedBatches. Now you are multiplying numDelayedBatches X processingSpeed. So you are scaling the current processing rate with number of batches that are delayed. Right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at
[GitHub] spark pull request: [SPARK-9202] capping maximum number of executo...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/7714#issuecomment-126315274 finally@srowen, @JoshRosen, @sarutak more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8979] Add a PID based rate estimator
Github user dragos commented on a diff in the pull request: https://github.com/apache/spark/pull/7648#discussion_r35862261 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/rate/PIDRateEstimator.scala --- @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.scheduler.rate + +/** + * Implements a proportional-integral-derivative (PID) controller which acts on + * the speed of ingestion of elements into Spark Streaming. A PID controller works + * by calculating an '''error''' between a measured output and a desired value. In the + * case of Spark Streaming the error is the difference between the measured processing + * rate (number of elements/processing delay) and the previous rate. + * + * @see https://en.wikipedia.org/wiki/PID_controller + * + * @param batchDurationMillis the batch duration, in milliseconds + * @param proportional how much the correction should depend on the current + *error. This term usually provides the bulk of correction. A value too large would + *make the controller overshoot the setpoint, while a small value would make the + *controller too insensitive. The default value is -1. + * @param integral how much the correction should depend on the accumulation + *of past errors. This term accelerates the movement towards the setpoint, but a large + *value may lead to overshooting. The default value is -0.2. + * @param derivative how much the correction should depend on a prediction + *of future errors, based on current rate of change. This term is not used very often, + *as it impacts stability of the system. The default value is 0. + */ +private[streaming] class PIDRateEstimator( +batchIntervalMillis: Long, +proportional: Double = -1D, +integral: Double = -.2D, +derivative: Double = 0D) + extends RateEstimator { + + private var firstRun: Boolean = true + private var latestTime: Long = -1L + private var latestRate: Double = -1D + private var latestError: Double = -1L + + require( +batchIntervalMillis 0, +sSpecified batch interval $batchIntervalMillis in PIDRateEstimator is invalid.) + + def compute(time: Long, // in milliseconds + elements: Long, + processingDelay: Long, // in milliseconds + schedulingDelay: Long // in milliseconds +): Option[Double] = { + +this.synchronized { + if (time latestTime processingDelay 0 batchIntervalMillis 0) { + +// in seconds, should be close to batchDuration +val delaySinceUpdate = (time - latestTime).toDouble / 1000 + +// in elements/second +val processingRate = elements.toDouble / processingDelay * 1000 + +// in elements/second +val error = latestRate - processingRate --- End diff -- Carrying over conversation from previous thread that got lost due to rebase Could you make the names more semantically meaningful? How about: error -- changeInRate? @tdas tdas added a note 14 hours ago Why is the latestRate considered as the set point (that's my assumption since the error is calculated between the observed value and the set point, according to PID theory)? @huitseeker @dragos dragos added a note 2 hours ago Since @huitseeker seems to be away, I'll answer this. The latestRate is what we considered the desired value at the previous batch update. With the new information we got for the last batch interval, we compute a current rate, and compare to what we asked for, that's constitutes our error that needs correction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at
[GitHub] spark pull request: [SPARK-8979] Add a PID based rate estimator
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7648#issuecomment-126284928 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-6885] [ML] decision tree support ...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/7694#issuecomment-126292059 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9248][SparkR] Closing curly-braces shou...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/7795 [SPARK-9248][SparkR] Closing curly-braces should always be on their own line ### JIRA [[SPARK-9248] Closing curly-braces should always be on their own line - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9248) ## The result of `dev/lint-r` [The result of `dev/lint-r` for SPARK-9248 at the revistion:6175d6cfe795fbd88e3ee713fac375038a3993a8](https://gist.github.com/yu-iskw/96cadcea4ce664c41f81) You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark SPARK-9248 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7795.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7795 commit c8eccd3ce0c11ee1b8df36b666017c7bbfbf811f Author: Yuu ISHIKAWA yuu.ishik...@gmail.com Date: 2015-07-30T11:32:01Z [SPARK-9248][SparkR] Closing curly-braces should always be on their own line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6485] [MLlib] [Python] Add CoordinateMa...
Github user dusenberrymw commented on a diff in the pull request: https://github.com/apache/spark/pull/7554#discussion_r35873295 --- Diff: python/pyspark/mllib/linalg.py --- @@ -1152,9 +1156,416 @@ def sparse(numRows, numCols, colPtrs, rowIndices, values): return SparseMatrix(numRows, numCols, colPtrs, rowIndices, values) +class DistributedMatrix(object): + +Represents a distributively stored matrix backed by one or +more RDDs. + + +def numRows(self): +Get or compute the number of rows. +raise NotImplementedError + +def numCols(self): +Get or compute the number of cols. +raise NotImplementedError + + +class RowMatrix(DistributedMatrix): + +.. note:: Experimental + +Represents a row-oriented distributed Matrix with no meaningful +row indices. + +:param rows: An RDD of vectors. +:param numRows: Number of rows in the matrix. A non-positive +value means unknown, at which point the number +of rows will be determined by the number of +records in the `rows` RDD. +:param numCols: Number of columns in the matrix. A non-positive +value means unknown, at which point the number +of columns will be determined by the size of +the first row. + +def __init__(self, rows, numRows=0, numCols=0): +Create a wrapper over a Java RowMatrix. +if not isinstance(rows, RDD): +raise TypeError(rows should be an RDD of vectors, got %s % type(rows)) --- End diff -- Yeah the argument doesn't have to be an RDD of actual `Vector` objects, but it should still be an RDD of _vectors_, which could be NumPy arrays, Python lists, `Vector`s, etc. for PySpark. The Spark MLlib Data Types guide makes this distinction for the end-user, so I think it is helpful to use it in the error message as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8625] [Core] Propagate user exceptions ...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/7014#issuecomment-126283659 @aarondav are you OK with this now? I think tom addressed all your concerns --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-6885] [ML] decision tree support ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7694#issuecomment-126293359 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8564][Streaming]Add the Python API for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6955#issuecomment-126310815 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9214] [ML] [PySpark] support ml.NaiveBa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7568#issuecomment-126311047 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9202] capping maximum number of executo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7714#issuecomment-126314864 [Test build #39041 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39041/console) for PR 7714 at commit [`23977fb`](https://github.com/apache/spark/commit/23977fb3bc590f58e9d4d44cfcce78ce0a49baca). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9202] capping maximum number of executo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7714#issuecomment-126314974 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8979] Add a PID based rate estimator
Github user dragos commented on a diff in the pull request: https://github.com/apache/spark/pull/7648#discussion_r35869487 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/rate/PIDRateEstimator.scala --- @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.scheduler.rate + +/** + * Implements a proportional-integral-derivative (PID) controller which acts on + * the speed of ingestion of elements into Spark Streaming. A PID controller works + * by calculating an '''error''' between a measured output and a desired value. In the + * case of Spark Streaming the error is the difference between the measured processing + * rate (number of elements/processing delay) and the previous rate. + * + * @see https://en.wikipedia.org/wiki/PID_controller + * + * @param batchDurationMillis the batch duration, in milliseconds + * @param proportional how much the correction should depend on the current + *error. This term usually provides the bulk of correction. A value too large would + *make the controller overshoot the setpoint, while a small value would make the + *controller too insensitive. The default value is -1. + * @param integral how much the correction should depend on the accumulation + *of past errors. This term accelerates the movement towards the setpoint, but a large + *value may lead to overshooting. The default value is -0.2. + * @param derivative how much the correction should depend on a prediction + *of future errors, based on current rate of change. This term is not used very often, + *as it impacts stability of the system. The default value is 0. + */ +private[streaming] class PIDRateEstimator( +batchIntervalMillis: Long, +proportional: Double = -1D, +integral: Double = -.2D, +derivative: Double = 0D) + extends RateEstimator { + + private var firstRun: Boolean = true + private var latestTime: Long = -1L + private var latestRate: Double = -1D + private var latestError: Double = -1L + + require( +batchIntervalMillis 0, +sSpecified batch interval $batchIntervalMillis in PIDRateEstimator is invalid.) + + def compute(time: Long, // in milliseconds + elements: Long, + processingDelay: Long, // in milliseconds + schedulingDelay: Long // in milliseconds +): Option[Double] = { + +this.synchronized { + if (time latestTime processingDelay 0 batchIntervalMillis 0) { + +// in seconds, should be close to batchDuration +val delaySinceUpdate = (time - latestTime).toDouble / 1000 + +// in elements/second +val processingRate = elements.toDouble / processingDelay * 1000 + +// in elements/second +val error = latestRate - processingRate + +// in elements/second +val sumError = schedulingDelay.toDouble * processingRate / batchIntervalMillis --- End diff -- Here's the gist of it: - we consider `schedulingDelay` as an indication of accumulated error, which corresponds to the integral part in a PID controller. Intuitively it makes sense: the fact that there is a delay means we had too many elements in previous batches, and the system can't process them in the given batch interval The challenge is to transform this indication from *time* to a rate, which is the quantity that our PID is measuring (and controlling). Here's the reasoning: - a scheduling delay `s` corresponds to `s * processingRate` *overflowing* elements. Those are elements that couldn't be processed in previous batches, leading to this delay. We assume the processingRate didn't change too much (since it's mostly a measure of the cluster performance, with small variations like checkpointing), but a good approximation - from the number of overflowing elements we can calculate the
[GitHub] spark pull request: [SPARK-8979] Add a PID based rate estimator
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7648#issuecomment-126336586 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-126342142 Added subtasks, changed the title of https://github.com/apache/spark/pull/7772 to refer to the streaming subtask jira ID. Let me know if you see anything on that that needs tweaking before the 1.5 freeze date --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-6885] [ML] decision tree support ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7694#issuecomment-126293796 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9248][SparkR] Closing curly-braces shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7795#issuecomment-126293792 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8979] Add a PID based rate estimator
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7648#issuecomment-126320237 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][CORE][WIP] expose Netty network l...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/7753#issuecomment-126282507 Since we'll eventually want to add more metrics, can you put all the netty metrics into another case class inside `ExecutorMetrics`? Also, I'm wondering if we want to use netty in the name -- I think most users won't know or care about netty in particular. It should it just be named network or transport, and the nio implementation should indicate that metrics are missing. I guess altogether this means doing something like: ```scala class ExecutorMetrics { var transportMetrics: Option[TransportMetrics] = ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5155] [PySpark] [Streaming] Mqtt stream...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4229#issuecomment-126307047 [Test build #39036 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39036/console) for PR 4229 at commit [`126608a`](https://github.com/apache/spark/commit/126608a02b55287684762811b0ade99dbce7d109). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MQTTUtils(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5155] [PySpark] [Streaming] Mqtt stream...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4229#issuecomment-126307123 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9437][core] avoid overflow in SizeEstim...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7750#issuecomment-126315107 [Test build #39039 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39039/console) for PR 7750 at commit [`29493f1`](https://github.com/apache/spark/commit/29493f12720dd9f02e8f199046f98f7a548756ea). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9437][core] avoid overflow in SizeEstim...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7750#issuecomment-126315194 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8978][Streaming] Implements the DirectK...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7796#issuecomment-126334650 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8978][Streaming] Implements the DirectK...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7796#issuecomment-126363832 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9248][SparkR] Closing curly-braces shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7795#issuecomment-126363833 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8862][SPARK-8862][SQL][WIP] Add basic i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7774#issuecomment-126363834 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][CORE][WIP] expose Netty network l...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7753#issuecomment-126363839 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8998][MLlib] Distribute PrefixSpan comp...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7783#discussion_r35881077 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala --- @@ -78,81 +97,153 @@ class PrefixSpan private ( * the value of pair is the pattern's count. */ def run(sequences: RDD[Array[Int]]): RDD[(Array[Int], Long)] = { +val sc = sequences.sparkContext + if (sequences.getStorageLevel == StorageLevel.NONE) { logWarning(Input data is not cached.) } -val minCount = getMinCount(sequences) -val lengthOnePatternsAndCounts = - getFreqItemAndCounts(minCount, sequences).collect() -val prefixAndProjectedDatabase = getPrefixAndProjectedDatabase( - lengthOnePatternsAndCounts.map(_._1), sequences) -val groupedProjectedDatabase = prefixAndProjectedDatabase - .map(x = (x._1.toSeq, x._2)) - .groupByKey() - .map(x = (x._1.toArray, x._2.toArray)) -val nextPatterns = getPatternsInLocal(minCount, groupedProjectedDatabase) -val lengthOnePatternsAndCountsRdd = - sequences.sparkContext.parallelize( -lengthOnePatternsAndCounts.map(x = (Array(x._1), x._2))) -val allPatterns = lengthOnePatternsAndCountsRdd ++ nextPatterns -allPatterns + +// Convert min support to a min number of transactions for this dataset +val minCount = if (minSupport == 0) 0L else math.ceil(sequences.count() * minSupport).toLong + +// (Frequent items - number of occurrences, all items here satisfy the `minSupport` threshold +val freqItemCounts = sequences + .flatMap(seq = seq.distinct.map(item = (item, 1L))) + .reduceByKey(_ + _) + .filter(_._2 = minCount) + .collect() + +// Pairs of (length 1 prefix, suffix consisting of frequent items) +val itemSuffixPairs = { + val freqItems = freqItemCounts.map(_._1).toSet + sequences.flatMap { seq = +val filteredSeq = seq.filter(freqItems.contains(_)) +freqItems.flatMap { item = + val candidateSuffix = LocalPrefixSpan.getSuffix(item, filteredSeq) + candidateSuffix match { +case suffix if !suffix.isEmpty = Some((List(item), suffix)) +case _ = None + } +} + } +} + +// Accumulator for the computed results to be returned, initialized to the frequent items (i.e. +// frequent length-one prefixes) +var resultsAccumulator = freqItemCounts.map(x = (List(x._1), x._2)) + +// Remaining work to be locally and distributively processed respectfully +var (pairsForLocal, pairsForDistributed) = partitionByProjDBSize(itemSuffixPairs) + +// Continue processing until no pairs for distributed processing remain (i.e. all prefixes have +// projected database sizes = `maxLocalProjDBSize`) +while (pairsForDistributed.count() != 0) { + val (nextPatternAndCounts, nextPrefixSuffixPairs) = +extendPrefixes(minCount, pairsForDistributed) + pairsForDistributed.unpersist() + val (smallerPairsPart, largerPairsPart) = partitionByProjDBSize(nextPrefixSuffixPairs) + pairsForDistributed = largerPairsPart + pairsForDistributed.persist(StorageLevel.MEMORY_AND_DISK) + pairsForLocal ++= smallerPairsPart + resultsAccumulator ++= nextPatternAndCounts.collect() --- End diff -- That is the worst case. We should assume that the number of frequent patterns are small. Having 1 billion frequent patterns doesn't provide any useful insights. So users should start with a high `minSupport` and collect just-enough number of frequent patterns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8979] Add a PID based rate estimator
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7648#issuecomment-126366590 [Test build #39047 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39047/consoleFull) for PR 7648 at commit [`26cfd78`](https://github.com/apache/spark/commit/26cfd78c339e58e71c138e424952002f13595389). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9214] [ML] [PySpark] support ml.NaiveBa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7568#issuecomment-126367139 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9277] [MLLIB] SparseVector constructor ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7794#issuecomment-126367052 [Test build #39046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39046/console) for PR 7794 at commit [`6ffe34a`](https://github.com/apache/spark/commit/6ffe34a560829ac0e1f85b92f958ab394b1dda7a). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9277] [MLLIB] SparseVector constructor ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7794#issuecomment-126367061 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8862][SPARK-8862][SQL][WIP] Add basic i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7774#issuecomment-126365441 [Test build #39043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39043/consoleFull) for PR 7774 at commit [`23abf73`](https://github.com/apache/spark/commit/23abf73cafac3af0363486bdae91d737e235a197). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][CORE][WIP] expose Netty network l...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7753#issuecomment-126366504 [Test build #39044 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39044/consoleFull) for PR 7753 at commit [`17e5b97`](https://github.com/apache/spark/commit/17e5b978618a5a6adfa3ff621e37eeecaa0b2b0c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9308] [ML] ml.NaiveBayesModel support p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7672#issuecomment-126366480 [Test build #39050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39050/consoleFull) for PR 7672 at commit [`3ee56d6`](https://github.com/apache/spark/commit/3ee56d68cc0404f8700641da2cf34c9a79fe2ba4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9308] [ML] ml.NaiveBayesModel support p...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/7672#discussion_r35880303 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala --- @@ -129,29 +129,49 @@ class NaiveBayesModel private[ml] ( throw new UnknownError(sInvalid modelType: ${$(modelType)}.) } - override protected def predict(features: Vector): Double = { + override val numClasses: Int = pi.size + + private def posteriorProbabilities(logProb: DenseVector) = { --- End diff -- Yes, posteriorProbabilities is easy to reuse, but it not easy to directly reuse multinomialCalculation, and bernoulliCalculation because the mllib.NaiveBayesModel and ml.NaiveBayesModel has different model parameters. ```java class NaiveBayesModel private[ml] ( override val uid: String, val pi: Vector, val theta: Matrix) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9408] [PySpark] [MLlib] Refactor linalg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7746#issuecomment-126371744 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9408] [PySpark] [MLlib] Refactor linalg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7746#issuecomment-126371787 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9277] [MLLIB] SparseVector constructor ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7794#issuecomment-126372904 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-6885] [ML] decision tree support ...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/7694#issuecomment-126357332 @jkbradley I create a new version of InformationGainStats called [ImpurityStats](https://github.com/apache/spark/pull/7694/files#diff-5770a6f8f5b1a8386ec0592a59bd74d2R81). It stores information gain, impurity, prediction related data all in one data structure which make LearningNode simplicity. Meanwhile it simplify and optimize binsToBestSplit function. I will fix some trivial issues after your reviews. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9277] [MLLIB] SparseVector constructor ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7794#issuecomment-126363830 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][CORE][WIP] expose Netty network l...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7753#issuecomment-126363827 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9471] [ML] Multilayer Perceptron
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7621#issuecomment-126363850 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8862][SPARK-8862][SQL][WIP] Add basic i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7774#issuecomment-126363846 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8979] Add a PID based rate estimator
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7648#issuecomment-126363835 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8064, build against Hive 1.2.1
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7191#issuecomment-126362290 Thanks @steveloughran I can take a crack at publishing to maven. Since that might take a day or so, one thing you can do is just put the forked hive jars in your people.apache.org web space and then add that as a repository to the build (a maven repository is just anything that can support HTTP downloading of the jars). In the mean time I can try to get publishing up and running. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8979] Add a PID based rate estimator
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7648#issuecomment-126364248 [Test build #39054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39054/consoleFull) for PR 7648 at commit [`93b74f8`](https://github.com/apache/spark/commit/93b74f884ea17da65297e47ff9a20b53d93225d1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9308] [ML] ml.NaiveBayesModel support p...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/7672#issuecomment-126366729 @jkbradley I have reply your comments inline and update this patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8979] Add a PID based rate estimator
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7648#issuecomment-126366528 [Test build #39052 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39052/consoleFull) for PR 7648 at commit [`7975b0c`](https://github.com/apache/spark/commit/7975b0c9703696653563d2b457b4ef071f30bfe9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9214] [ML] [PySpark] support ml.NaiveBa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7568#issuecomment-126366493 [Test build #39051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39051/consoleFull) for PR 7568 at commit [`f9c94d1`](https://github.com/apache/spark/commit/f9c94d1015e0e328aa265b86c9b95ec8185f9ba6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9248][SparkR] Closing curly-braces shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7795#issuecomment-126368860 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9248][SparkR] Closing curly-braces shou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7795#issuecomment-126368754 [Test build #39048 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39048/console) for PR 7795 at commit [`c8eccd3`](https://github.com/apache/spark/commit/c8eccd3ce0c11ee1b8df36b666017c7bbfbf811f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-] [MLlib] minor fix on tokenizer doc
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/7791 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9277] [MLLIB] SparseVector constructor ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7794#discussion_r35883632 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala --- @@ -57,6 +57,21 @@ class VectorsSuite extends SparkFunSuite with Logging { assert(vec.values === values) } + test(sparse vector construction with mismatched indices/values array) { +intercept[IllegalArgumentException] { + Vectors.sparse(4, Array(1,2,3), Array(3.0, 5.0, 7.0, 9.0)) --- End diff -- space after `,` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9408] [PySpark] [MLlib] Refactor linalg...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7746#issuecomment-126371427 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9277] [MLLIB] SparseVector constructor ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/7794#discussion_r35884108 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala --- @@ -57,6 +57,21 @@ class VectorsSuite extends SparkFunSuite with Logging { assert(vec.values === values) } + test(sparse vector construction with mismatched indices/values array) { +intercept[IllegalArgumentException] { + Vectors.sparse(4, Array(1,2,3), Array(3.0, 5.0, 7.0, 9.0)) --- End diff -- Oh duh, fix coming. Every time I think I can't possibly need to run scalastyle as well as the test ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9277] [MLLIB] SparseVector constructor ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7794#issuecomment-126373365 [Test build #39061 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39061/consoleFull) for PR 7794 at commit [`e8dc31e`](https://github.com/apache/spark/commit/e8dc31e899148989027b3a72e47a803a368a9881). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8735] [WIP] [SQL] Expose memory usage f...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/7770#issuecomment-126373212 The information exposed in this patch will be tied to accumulators on the SQL tab introduced in #7774 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6485] [MLlib] [Python] Add CoordinateMa...
Github user dusenberrymw commented on the pull request: https://github.com/apache/spark/pull/7554#issuecomment-126347627 Thanks, @MechCoder! I say we go ahead and optimize the conversions now though while this is still open. I'm thinking that adding an optional `java_matrix` parameter to the constructors will be the way to go. Then, if an argument for that is present, we can just store that internally, rather than create a new Java matrix. cc @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-6885] [ML] decision tree support ...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/7694#issuecomment-126350949 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-6885] [ML] decision tree support ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7694#issuecomment-126351527 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7368][MLlib] Add QR decomposition for R...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/5909#issuecomment-126355388 LGTM. Merged into master. Thanks! Sorry for the long delay on code review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9308] [ML] ml.NaiveBayesModel support p...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/7672#discussion_r35879500 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala --- @@ -129,29 +129,49 @@ class NaiveBayesModel private[ml] ( throw new UnknownError(sInvalid modelType: ${$(modelType)}.) } - override protected def predict(features: Vector): Double = { + override val numClasses: Int = pi.size + + private def posteriorProbabilities(logProb: DenseVector) = { +val logProbArray = logProb.toArray +val maxLog = logProbArray.max +val scaledProbs = logProbArray.map(lp = math.exp(lp - maxLog)) +val probSum = scaledProbs.sum +new DenseVector(scaledProbs.map(_ / probSum)) + } + + private def multinomialCalculation(testData: Vector) = { +val prob = theta.multiply(testData) +BLAS.axpy(1.0, pi, prob) +prob + } + + private def bernoulliCalculation(testData: Vector) = { +testData.foreachActive((_, value) = + if (value != 0.0 value != 1.0) { +throw new SparkException( + sBernoulli naive Bayes requires 0 or 1 feature values but found $testData.) + } +) +val prob = thetaMinusNegTheta.get.multiply(testData) +BLAS.axpy(1.0, pi, prob) +BLAS.axpy(1.0, negThetaSum.get, prob) +prob + } + + override protected def predictRaw(features: Vector): Vector = { --- End diff -- Agree, done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5561] [mllib] Generalized PeriodicCheck...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7728#issuecomment-126359650 LGTM. Merged into master. Thanks! Btw, it is not necessary to specify the item type of RDD or Graph. Checkpointing doesn't care the item type. Maybe we can try `RDD[_]` and `Graph[_, _]`, which might simplify the code a little bit (if it compiles). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9214] [ML] [PySpark] support ml.NaiveBa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7568#issuecomment-126367129 [Test build #39051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39051/console) for PR 7568 at commit [`f9c94d1`](https://github.com/apache/spark/commit/f9c94d1015e0e328aa265b86c9b95ec8185f9ba6). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class NaiveBayes(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol):` * `class NaiveBayesModel(JavaModel):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8998][MLlib] Distribute PrefixSpan comp...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7783#issuecomment-126367159 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-6885] [ML] decision tree support ...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/7694#issuecomment-126367420 It looks like unrelated failure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9277] [MLLIB] SparseVector constructor ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7794#issuecomment-126366497 [Test build #39046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39046/consoleFull) for PR 7794 at commit [`6ffe34a`](https://github.com/apache/spark/commit/6ffe34a560829ac0e1f85b92f958ab394b1dda7a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-6885] [ML] decision tree support ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7694#issuecomment-126366153 [Test build #39049 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39049/console) for PR 7694 at commit [`fbbe2ec`](https://github.com/apache/spark/commit/fbbe2ecd463dc8d219080fdd8649f92b9fdf38c5). * This patch **fails Python style tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `class ImpurityStats(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-6885] [ML] decision tree support ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7694#issuecomment-126366037 [Test build #165 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/165/console) for PR 7694 at commit [`fbbe2ec`](https://github.com/apache/spark/commit/fbbe2ecd463dc8d219080fdd8649f92b9fdf38c5). * This patch **fails Python style tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `class ImpurityStats(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8998][MLlib] Distribute PrefixSpan comp...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7783#discussion_r35881343 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala --- @@ -78,81 +97,153 @@ class PrefixSpan private ( * the value of pair is the pattern's count. */ def run(sequences: RDD[Array[Int]]): RDD[(Array[Int], Long)] = { +val sc = sequences.sparkContext + if (sequences.getStorageLevel == StorageLevel.NONE) { logWarning(Input data is not cached.) } -val minCount = getMinCount(sequences) -val lengthOnePatternsAndCounts = - getFreqItemAndCounts(minCount, sequences).collect() -val prefixAndProjectedDatabase = getPrefixAndProjectedDatabase( - lengthOnePatternsAndCounts.map(_._1), sequences) -val groupedProjectedDatabase = prefixAndProjectedDatabase - .map(x = (x._1.toSeq, x._2)) - .groupByKey() - .map(x = (x._1.toArray, x._2.toArray)) -val nextPatterns = getPatternsInLocal(minCount, groupedProjectedDatabase) -val lengthOnePatternsAndCountsRdd = - sequences.sparkContext.parallelize( -lengthOnePatternsAndCounts.map(x = (Array(x._1), x._2))) -val allPatterns = lengthOnePatternsAndCountsRdd ++ nextPatterns -allPatterns + +// Convert min support to a min number of transactions for this dataset +val minCount = if (minSupport == 0) 0L else math.ceil(sequences.count() * minSupport).toLong + +// (Frequent items - number of occurrences, all items here satisfy the `minSupport` threshold +val freqItemCounts = sequences + .flatMap(seq = seq.distinct.map(item = (item, 1L))) + .reduceByKey(_ + _) + .filter(_._2 = minCount) + .collect() + +// Pairs of (length 1 prefix, suffix consisting of frequent items) +val itemSuffixPairs = { + val freqItems = freqItemCounts.map(_._1).toSet + sequences.flatMap { seq = +val filteredSeq = seq.filter(freqItems.contains(_)) +freqItems.flatMap { item = + val candidateSuffix = LocalPrefixSpan.getSuffix(item, filteredSeq) + candidateSuffix match { +case suffix if !suffix.isEmpty = Some((List(item), suffix)) +case _ = None + } +} + } +} + +// Accumulator for the computed results to be returned, initialized to the frequent items (i.e. +// frequent length-one prefixes) +var resultsAccumulator = freqItemCounts.map(x = (List(x._1), x._2)) + +// Remaining work to be locally and distributively processed respectfully +var (pairsForLocal, pairsForDistributed) = partitionByProjDBSize(itemSuffixPairs) + +// Continue processing until no pairs for distributed processing remain (i.e. all prefixes have +// projected database sizes = `maxLocalProjDBSize`) +while (pairsForDistributed.count() != 0) { + val (nextPatternAndCounts, nextPrefixSuffixPairs) = +extendPrefixes(minCount, pairsForDistributed) + pairsForDistributed.unpersist() + val (smallerPairsPart, largerPairsPart) = partitionByProjDBSize(nextPrefixSuffixPairs) + pairsForDistributed = largerPairsPart + pairsForDistributed.persist(StorageLevel.MEMORY_AND_DISK) + pairsForLocal ++= smallerPairsPart + resultsAccumulator ++= nextPatternAndCounts.collect() +} + +// Process the small projected databases locally +val remainingResults = getPatternsInLocal( + minCount, sc.parallelize(pairsForLocal, 1).groupByKey()) + +(sc.parallelize(resultsAccumulator, 1) ++ remainingResults) + .map { case (pattern, count) = (pattern.toArray, count) } } + /** - * Get the minimum count (sequences count * minSupport). - * @param sequences input data set, contains a set of sequences, - * @return minimum count, + * Partitions the prefix-suffix pairs by projected database size. + * @param prefixSuffixPairs prefix (length n) and suffix pairs, + * @return prefix-suffix pairs partitioned by whether their projected database size is = or + * greater than [[maxLocalProjDBSize]] */ - private def getMinCount(sequences: RDD[Array[Int]]): Long = { -if (minSupport == 0) 0L else math.ceil(sequences.count() * minSupport).toLong + private def partitionByProjDBSize(prefixSuffixPairs: RDD[(List[Int], Array[Int])]) +: (Array[(List[Int], Array[Int])], RDD[(List[Int], Array[Int])]) = { +val prefixToSuffixSize = prefixSuffixPairs + .aggregateByKey(0)( +seqOp = { case (count, suffix) = count + suffix.length }, +combOp = { _ + _ }) +val
[GitHub] spark pull request: [SPARK-9308] [ML] ml.NaiveBayesModel support p...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/7672#discussion_r35881182 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala --- @@ -46,6 +51,44 @@ class NaiveBayesSuite extends SparkFunSuite with MLlibTestSparkContext { assert(model.theta.map(math.exp) ~== thetaData.map(math.exp) absTol 0.05, theta mismatch) } + def expectedMultinomialProbabilities(model: NaiveBayesModel, feature: Vector): Vector = { --- End diff -- Like above, the ml.NaiveBayesModel parameters are all based on Vector and Matrix which is different from the old one, so I just want to make some facility test functions on this kinds of model rather than converting it to old style model. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9359][SQL] Support IntervalType for Par...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/7793#issuecomment-126369508 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9359][SQL] Support IntervalType for Par...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7793#issuecomment-126370367 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9359][SQL] Support IntervalType for Par...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7793#issuecomment-126370384 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9359][SQL] Support IntervalType for Par...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7793#issuecomment-126370562 [Test build #39058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39058/consoleFull) for PR 7793 at commit [`ad46986`](https://github.com/apache/spark/commit/ad4698629f005b113a0f02c2f8a1faa32a8f8aaa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5561] [mllib] Generalized PeriodicCheck...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/7728#issuecomment-126372592 @jkbradley thanks, this is actually not affected by the recent checkpointing changes since we keep the old code path. In the future you can switch to calling `rdd.localCheckpoint()` and suddenly everything will be a little faster. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9479][Streaming][Tests]Fix ReceiverTrac...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/7797 [SPARK-9479][Streaming][Tests]Fix ReceiverTrackerSuite failure See https://issues.apache.org/jira/browse/SPARK-9479 for the failure cause. The PR includes the following changes: 1. Make ReceiverTrackerSuite create StreamingContext in the test body. 2. Fix places that don't stop StreamingContext. I verified no SparkContext was stopped in the shutdown hook locally after this fix. 3. Fix an issue that `ReceiverTracker.endpoint` may be null. 4. Make sure stopping SparkContext in non-main thread won't fail other tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark fix-ReceiverTrackerSuite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7797.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7797 commit d7497df154ac8f44662e5511c70d43fd79f9eabb Author: zsxwing zsxw...@gmail.com Date: 2015-07-30T15:16:53Z Fix ReceiverTrackerSuite; make sure StreamingContext in tests is closed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9408] [PySpark] [MLlib] Refactor linalg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7746#issuecomment-126354452 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7937][SQL] Support comparison on Struct...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/6519#issuecomment-126357418 ping @rxin any further comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5561] [mllib] Generalized PeriodicCheck...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/7728 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9471] [ML] Multilayer Perceptron
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7621#issuecomment-126360987 Thanks! The branch name doesn't matter:) I will make another pass today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9277] [MLLIB] SparseVector constructor ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7794#issuecomment-126360771 LGTM. I think it is useful to add the same check to Python. @MechCoder could you add it after #7746 ? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9308] [ML] ml.NaiveBayesModel support p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7672#issuecomment-126363829 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8979] Add a PID based rate estimator
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7648#issuecomment-126363831 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][CORE][WIP] expose Netty network l...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7753#issuecomment-126364386 [Test build #164 has started](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/164/consoleFull) for PR 7753 at commit [`17e5b97`](https://github.com/apache/spark/commit/17e5b978618a5a6adfa3ff621e37eeecaa0b2b0c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-6885] [ML] decision tree support ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7694#issuecomment-126364431 [Test build #165 has started](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/165/consoleFull) for PR 7694 at commit [`fbbe2ec`](https://github.com/apache/spark/commit/fbbe2ecd463dc8d219080fdd8649f92b9fdf38c5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8979] Add a PID based rate estimator
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7648#issuecomment-126363847 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8862][SPARK-8862][SQL][WIP] Add basic i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7774#issuecomment-126364218 [Test build #163 has started](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/163/consoleFull) for PR 7774 at commit [`23abf73`](https://github.com/apache/spark/commit/23abf73cafac3af0363486bdae91d737e235a197). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org