[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9253#issuecomment-150644300 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/9253#issuecomment-150643877 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11271][SPARK-11016][Core] Use Spark Bit...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9243#issuecomment-150643375 @lemire any comment on this thread? Looks like we are having some trouble with the roaring bitmap. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9215#discussion_r42892965 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -62,7 +67,19 @@ private[sql] trait SQLMetricValue[T] extends Serializable { private[sql] class LongSQLMetricValue(private var _value : Long) extends SQLMetricValue[Long] { def add(incr: Long): LongSQLMetricValue = { -_value += incr +// Some LongSQLMetric will use -1 as initial value, so if the accumulator is never updated, +// we can filter it out later. However, when `add` is called, the accumulator is valid, we +// should turn -1 to 0. +if (_value < 0) { + _value = 0 +} + +// Some LongSQLMetric will use -1 as initial value, when we merge accumulator updates at driver +// side, we should ignore these -1 values. +if (incr > 0) { + _value += incr +} + --- End diff -- @cloud-fan sorry. I just realized that this method is in the critical path (when we calculate numRows). How about we remove this change and document it clear that those negative initial values will have a small impact on the sum of memory consumption? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9215#issuecomment-150643054 **[Test build #44237 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44237/consoleFull)** for PR 9215 at commit [`4ff8912`](https://github.com/apache/spark/commit/4ff891205979a06abbf229f115bbbf99bda3ba1f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9215#issuecomment-150640729 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9215#issuecomment-150640759 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9249#issuecomment-150640384 **[Test build #44236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44236/consoleFull)** for PR 9249 at commit [`18d2861`](https://github.com/apache/spark/commit/18d28619264dbaf10f1e27576f5c4275cbc4ef72). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9214#issuecomment-150640491 **[Test build #44235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44235/consoleFull)** for PR 9214 at commit [`4145651`](https://github.com/apache/spark/commit/4145651724eb99fb440cc8509df154d8f8095b47). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9249#issuecomment-150639887 Thank you @stephend-realitymine for working on it! Overall, the change in infer schema looks good. I left a comment at the test part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9249#issuecomment-150639689 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9249#issuecomment-150639713 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9249#issuecomment-150639251 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9249#discussion_r42890747 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala --- @@ -103,11 +107,14 @@ private[sql] object InferSchema { // the type as we pass through all JSON objects. var elementType: DataType = NullType while (nextUntil(parser, END_ARRAY)) { - elementType = compatibleType(elementType, inferField(parser)) + elementType = compatibleType(elementType, inferField(parser, primitivesAsString)) } ArrayType(elementType) + case (VALUE_NUMBER_INT | VALUE_NUMBER_FLOAT) if primitivesAsString => StringType + case (VALUE_TRUE | VALUE_FALSE) if primitivesAsString => StringType --- End diff -- Add a newline between these two cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9214#issuecomment-150638330 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9214#issuecomment-150638306 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9249#discussion_r42890562 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1262,4 +1299,4 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { ) } } -} +} --- End diff -- Add a newline. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9249#discussion_r42890545 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -632,6 +632,39 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { ) } + test("Loading a JSON dataset primitivesAsString returns schema with primitive types as strings") { +val dir = Utils.createTempDir() +dir.delete() +val path = dir.getCanonicalPath +primitiveFieldAndType.map(record => record.replaceAll("\n", " ")).saveAsTextFile(path) +val jsonDF = sqlContext.read.option("primitivesAsString", "true").json(path) + +val expectedSchema = StructType( + StructField("bigInteger", DecimalType(20, 0), true) :: + StructField("boolean", BooleanType, true) :: + StructField("double", DoubleType, true) :: + StructField("integer", LongType, true) :: + StructField("long", LongType, true) :: + StructField("null", StringType, true) :: + StructField("string", StringType, true) :: Nil) --- End diff -- Looks like we need to change all of these data types to `StringType`, right? Also, can you add a test with complex types (`StructType` and `ArrayType`) to make sure we preserve the structure? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9003#issuecomment-150637284 **[Test build #44234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44234/consoleFull)** for PR 9003 at commit [`fd3f4d6`](https://github.com/apache/spark/commit/fd3f4d6f9ba5124406a7078c9e7991bf91abdad6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11194] [SQL] [BRANCH-1.5] [WIP] Use Mut...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9171#discussion_r42889962 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala --- @@ -100,7 +99,7 @@ case class AddJar(path: String) extends RunnableCommand { // returns the value of a thread local variable and its HiveConf may not be the HiveConf // associated with `executionHive.state` (for example, HiveContext is created in one thread // and then add jar is called from another thread). -hiveContext.executionHive.state.getConf.setClassLoader(newClassLoader) + hiveContext.executionHive.state.getConf.setClassLoader(hiveContext.libraryClassLoader) --- End diff -- I have changed the classloader to a non-closable mutable url class loader. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42889941 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,327 @@ object HyperLogLogPlusPlus { ) // scalastyle:on } + +/** + * A central moment is the expected value of a specified power of the deviation of a random + * variable from the mean. Central moments are often used to characterize the properties of about + * the shape of a distribution. + * + * This class implements online, one-pass algorithms for computing the central moments of a set of + * points. + * + * References: + * - Xiangrui Meng. "Simpler Online Updates for Arbitrary-Order Central Moments." + * 2015. http://arxiv.org/abs/1510.04923 + * + * @see [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + * Algorithms for calculating variance (Wikipedia)]] + * + * @param child to compute central moments of. + */ +abstract class CentralMomentAgg(child: Expression) extends ImperativeAggregate with Serializable { + + /** + * The maximum central moment order to be computed. + */ + protected def momentOrder: Int + + /** + * Array of sufficient moments need to compute the aggregate statistic. + */ + protected def sufficientMoments: Array[Int] + + override def children: Seq[Expression] = Seq(child) + + override def nullable: Boolean = false + + override def dataType: DataType = DoubleType + + // Expected input data type. + // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the + // new version at planning time (after analysis phase). For now, NullType is added at here + // to make it resolved when we have cases like `select avg(null)`. + // We can use our analyzer to cast NullType to the default data type of the NumericType once + // we remove the old aggregate functions. Then, we will not need NullType at here. + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType)) + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + /** + * The number of central moments to store in the buffer. + */ + private[this] val numMoments = 5 + + override val aggBufferAttributes: Seq[AttributeReference] = Seq.tabulate(numMoments) { i => +AttributeReference(s"M$i", DoubleType)() + } + + // Note: although this simply copies aggBufferAttributes, this common code can not be placed + // in the superclass because that will lead to initialization ordering issues. + override val inputAggBufferAttributes: Seq[AttributeReference] = +aggBufferAttributes.map(_.newInstance()) + + /** + * Initialize all moments to zero. + */ + override def initialize(buffer: MutableRow): Unit = { +for (aggIndex <- 0 until numMoments) { + buffer.setDouble(mutableAggBufferOffset + aggIndex, 0.0) +} + } + + // frequently used values for online updates + private[this] var delta = 0.0 + private[this] var deltaN = 0.0 + private[this] var delta2 = 0.0 + private[this] var deltaN2 = 0.0 + + /** + * Update the central moments buffer. + */ + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val v = Cast(child, DoubleType).eval(input) +if (v != null) { + val updateValue = v match { +case d: Double => d +case _ => 0.0 + } + var n = buffer.getDouble(mutableAggBufferOffset) + var mean = buffer.getDouble(mutableAggBufferOffset + 1) + var m2 = 0.0 + var m3 = 0.0 + var m4 = 0.0 + + n += 1.0 + delta = updateValue - mean + deltaN = delta / n + mean += deltaN + buffer.setDouble(mutableAggBufferOffset, n) + buffer.setDouble(mutableAggBufferOffset + 1, mean) + + if (momentOrder >= 2) { +m2 = buffer.getDouble(mutableAggBufferOffset + 2) +m2 += delta * (delta - deltaN) +buffer.setDouble(mutableAggBufferOffset + 2, m2) + } + + if (momentOrder >= 3) { +delta2 = delta * delta +deltaN2 = deltaN * deltaN +m3 = buffer.getDouble(mutableAggBufferOffset + 3) +m3 += -3.0 * deltaN * m2 + delta * (delta2 - deltaN2) +buffer.setDouble(mutableAggBufferOffset + 3, m3) + } + + if (momentOrder >= 4) { +m4 = buffer.getDouble(mutableAggBufferOffset + 4) +m4 += -4.0 * deltaN * m3 - 6.0 * deltaN2 * m2 + + delta * (delta * d
[GitHub] spark pull request: [SPARK-11194] [SQL] [BRANCH-1.5] [WIP] Use Mut...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9171#issuecomment-150637021 [Test build #44233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44233/consoleFull) for PR 9171 at commit [`7951df1`](https://github.com/apache/spark/commit/7951df1a91271826c8405f19d0ce12873faff21b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42889772 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,327 @@ object HyperLogLogPlusPlus { ) // scalastyle:on } + +/** + * A central moment is the expected value of a specified power of the deviation of a random + * variable from the mean. Central moments are often used to characterize the properties of about + * the shape of a distribution. + * + * This class implements online, one-pass algorithms for computing the central moments of a set of + * points. + * + * References: + * - Xiangrui Meng. "Simpler Online Updates for Arbitrary-Order Central Moments." + * 2015. http://arxiv.org/abs/1510.04923 + * + * @see [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + * Algorithms for calculating variance (Wikipedia)]] + * + * @param child to compute central moments of. + */ +abstract class CentralMomentAgg(child: Expression) extends ImperativeAggregate with Serializable { + + /** + * The maximum central moment order to be computed. + */ + protected def momentOrder: Int + + /** + * Array of sufficient moments need to compute the aggregate statistic. + */ + protected def sufficientMoments: Array[Int] + + override def children: Seq[Expression] = Seq(child) + + override def nullable: Boolean = false + + override def dataType: DataType = DoubleType + + // Expected input data type. + // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the + // new version at planning time (after analysis phase). For now, NullType is added at here + // to make it resolved when we have cases like `select avg(null)`. + // We can use our analyzer to cast NullType to the default data type of the NumericType once + // we remove the old aggregate functions. Then, we will not need NullType at here. + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType)) + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + /** + * The number of central moments to store in the buffer. + */ + private[this] val numMoments = 5 + + override val aggBufferAttributes: Seq[AttributeReference] = Seq.tabulate(numMoments) { i => +AttributeReference(s"M$i", DoubleType)() + } + + // Note: although this simply copies aggBufferAttributes, this common code can not be placed + // in the superclass because that will lead to initialization ordering issues. + override val inputAggBufferAttributes: Seq[AttributeReference] = +aggBufferAttributes.map(_.newInstance()) + + /** + * Initialize all moments to zero. + */ + override def initialize(buffer: MutableRow): Unit = { +for (aggIndex <- 0 until numMoments) { + buffer.setDouble(mutableAggBufferOffset + aggIndex, 0.0) +} + } + + // frequently used values for online updates + private[this] var delta = 0.0 + private[this] var deltaN = 0.0 + private[this] var delta2 = 0.0 + private[this] var deltaN2 = 0.0 + + /** + * Update the central moments buffer. + */ + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val v = Cast(child, DoubleType).eval(input) +if (v != null) { + val updateValue = v match { +case d: Double => d +case _ => 0.0 + } + var n = buffer.getDouble(mutableAggBufferOffset) + var mean = buffer.getDouble(mutableAggBufferOffset + 1) + var m2 = 0.0 --- End diff -- Added as `private[this]` vars --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42889712 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,327 @@ object HyperLogLogPlusPlus { ) // scalastyle:on } + +/** + * A central moment is the expected value of a specified power of the deviation of a random + * variable from the mean. Central moments are often used to characterize the properties of about + * the shape of a distribution. + * + * This class implements online, one-pass algorithms for computing the central moments of a set of + * points. + * + * References: + * - Xiangrui Meng. "Simpler Online Updates for Arbitrary-Order Central Moments." + * 2015. http://arxiv.org/abs/1510.04923 + * + * @see [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + * Algorithms for calculating variance (Wikipedia)]] + * + * @param child to compute central moments of. + */ +abstract class CentralMomentAgg(child: Expression) extends ImperativeAggregate with Serializable { + + /** + * The maximum central moment order to be computed. + */ + protected def momentOrder: Int + + /** + * Array of sufficient moments need to compute the aggregate statistic. + */ + protected def sufficientMoments: Array[Int] + + override def children: Seq[Expression] = Seq(child) + + override def nullable: Boolean = false + + override def dataType: DataType = DoubleType + + // Expected input data type. + // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the + // new version at planning time (after analysis phase). For now, NullType is added at here + // to make it resolved when we have cases like `select avg(null)`. + // We can use our analyzer to cast NullType to the default data type of the NumericType once + // we remove the old aggregate functions. Then, we will not need NullType at here. + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType)) + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + /** + * The number of central moments to store in the buffer. + */ + private[this] val numMoments = 5 + + override val aggBufferAttributes: Seq[AttributeReference] = Seq.tabulate(numMoments) { i => +AttributeReference(s"M$i", DoubleType)() + } + + // Note: although this simply copies aggBufferAttributes, this common code can not be placed + // in the superclass because that will lead to initialization ordering issues. + override val inputAggBufferAttributes: Seq[AttributeReference] = +aggBufferAttributes.map(_.newInstance()) + + /** + * Initialize all moments to zero. + */ + override def initialize(buffer: MutableRow): Unit = { +for (aggIndex <- 0 until numMoments) { + buffer.setDouble(mutableAggBufferOffset + aggIndex, 0.0) +} + } + + // frequently used values for online updates + private[this] var delta = 0.0 + private[this] var deltaN = 0.0 + private[this] var delta2 = 0.0 + private[this] var deltaN2 = 0.0 + + /** + * Update the central moments buffer. + */ + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val v = Cast(child, DoubleType).eval(input) +if (v != null) { + val updateValue = v match { +case d: Double => d +case _ => 0.0 --- End diff -- Looking at the code, `Cast.eval` should return the correct type or null, so this extra case statements has been removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42889596 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,327 @@ object HyperLogLogPlusPlus { ) // scalastyle:on } + +/** + * A central moment is the expected value of a specified power of the deviation of a random + * variable from the mean. Central moments are often used to characterize the properties of about + * the shape of a distribution. + * + * This class implements online, one-pass algorithms for computing the central moments of a set of + * points. + * + * References: + * - Xiangrui Meng. "Simpler Online Updates for Arbitrary-Order Central Moments." + * 2015. http://arxiv.org/abs/1510.04923 + * + * @see [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + * Algorithms for calculating variance (Wikipedia)]] + * + * @param child to compute central moments of. + */ +abstract class CentralMomentAgg(child: Expression) extends ImperativeAggregate with Serializable { + + /** + * The maximum central moment order to be computed. + */ + protected def momentOrder: Int + + /** + * Array of sufficient moments need to compute the aggregate statistic. + */ + protected def sufficientMoments: Array[Int] + + override def children: Seq[Expression] = Seq(child) + + override def nullable: Boolean = false + + override def dataType: DataType = DoubleType + + // Expected input data type. + // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the + // new version at planning time (after analysis phase). For now, NullType is added at here + // to make it resolved when we have cases like `select avg(null)`. + // We can use our analyzer to cast NullType to the default data type of the NumericType once + // we remove the old aggregate functions. Then, we will not need NullType at here. + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType)) + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + /** + * The number of central moments to store in the buffer. + */ + private[this] val numMoments = 5 + + override val aggBufferAttributes: Seq[AttributeReference] = Seq.tabulate(numMoments) { i => +AttributeReference(s"M$i", DoubleType)() + } + + // Note: although this simply copies aggBufferAttributes, this common code can not be placed + // in the superclass because that will lead to initialization ordering issues. + override val inputAggBufferAttributes: Seq[AttributeReference] = +aggBufferAttributes.map(_.newInstance()) + + /** + * Initialize all moments to zero. + */ + override def initialize(buffer: MutableRow): Unit = { +for (aggIndex <- 0 until numMoments) { + buffer.setDouble(mutableAggBufferOffset + aggIndex, 0.0) +} + } + + // frequently used values for online updates + private[this] var delta = 0.0 --- End diff -- done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42889558 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,327 @@ object HyperLogLogPlusPlus { ) // scalastyle:on } + +/** + * A central moment is the expected value of a specified power of the deviation of a random + * variable from the mean. Central moments are often used to characterize the properties of about + * the shape of a distribution. + * + * This class implements online, one-pass algorithms for computing the central moments of a set of + * points. + * + * References: + * - Xiangrui Meng. "Simpler Online Updates for Arbitrary-Order Central Moments." + * 2015. http://arxiv.org/abs/1510.04923 + * + * @see [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + * Algorithms for calculating variance (Wikipedia)]] + * + * @param child to compute central moments of. + */ +abstract class CentralMomentAgg(child: Expression) extends ImperativeAggregate with Serializable { + + /** + * The maximum central moment order to be computed. + */ + protected def momentOrder: Int + + /** + * Array of sufficient moments need to compute the aggregate statistic. + */ + protected def sufficientMoments: Array[Int] --- End diff -- Removed this def and instead pass all moments up to the maximum moment to the `getStatistic` function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11194] [SQL] [BRANCH-1.5] [WIP] Use Mut...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9171#issuecomment-150636241 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11194] [SQL] [BRANCH-1.5] [WIP] Use Mut...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9171#issuecomment-150636270 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9003#issuecomment-150636269 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9003#issuecomment-150636243 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11274][SQL] Text data source support fo...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9240#discussion_r42889239 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/DefaultSource.scala --- @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.text + +import com.google.common.base.Objects +import org.apache.hadoop.fs.{Path, FileStatus} +import org.apache.hadoop.io.{NullWritable, Text, LongWritable} +import org.apache.hadoop.mapred.{TextInputFormat, JobConf} +import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat +import org.apache.hadoop.mapreduce.{RecordWriter, TaskAttemptContext, Job} +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.mapred.SparkHadoopMapRedUtil +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.GenericMutableRow +import org.apache.spark.sql.{AnalysisException, Row, SQLContext} +import org.apache.spark.sql.execution.datasources.PartitionSpec +import org.apache.spark.sql.sources._ +import org.apache.spark.sql.types.{StringType, StructType} +import org.apache.spark.unsafe.types.UTF8String + +/** + * A data source for reading text files. + */ +class DefaultSource extends HadoopFsRelationProvider with DataSourceRegister { + + override def createRelation( + sqlContext: SQLContext, + paths: Array[String], + dataSchema: Option[StructType], + partitionColumns: Option[StructType], + parameters: Map[String, String]): HadoopFsRelation = { +dataSchema.foreach(verifySchema) +new TextRelation(None, partitionColumns, paths)(sqlContext) + } + + override def shortName(): String = "text" + + private def verifySchema(schema: StructType): Unit = { +if (schema.size != 1) { + throw new AnalysisException( +s"Text data source supports only a single column, and you have ${schema.size} columns.") +} +val tpe = schema(0).dataType +if (tpe != StringType) { + throw new AnalysisException( +s"Text data source supports only a string column, but you have ${tpe.simpleString}.") +} + } +} + +private[sql] class TextRelation( +val maybePartitionSpec: Option[PartitionSpec], +override val userDefinedPartitionColumns: Option[StructType], +override val paths: Array[String] = Array.empty[String]) +(@transient val sqlContext: SQLContext) + extends HadoopFsRelation(maybePartitionSpec) { + + /** Data schema is always a single column, named "text". */ + override def dataSchema: StructType = new StructType().add("text", StringType) + + /** This is an internal data source that outputs internal row format. */ + override val needConversion: Boolean = false + + /** Read path. */ + override def buildScan(inputPaths: Array[FileStatus]): RDD[Row] = { +val job = new Job(sqlContext.sparkContext.hadoopConfiguration) +val conf = SparkHadoopUtil.get.getConfigurationFromJobContext(job) +val paths = inputPaths.map(_.getPath).sorted --- End diff -- I got a ``` [error] /Users/yhuai/Projects/Spark/yin-spark-1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/DefaultSource.scala:86: No implicit Ordering defined for org.apache.hadoop.fs.Path. [error] val paths = inputPaths.map(_.getPath).sorted [error] ^ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- ---
[GitHub] spark pull request: [SPARK-11184] [MLLIB] Declare most of .mllib c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9169#issuecomment-150632553 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11184] [MLLIB] Declare most of .mllib c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9169#issuecomment-150632406 **[Test build #44228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44228/consoleFull)** for PR 9169 at commit [`22c6277`](https://github.com/apache/spark/commit/22c62774b04a3f845d4253dc0412ade2b8d8c7ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11184] [MLLIB] Declare most of .mllib c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9169#issuecomment-150632554 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44228/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11125] [SQL] Uninformative exception wh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9134#issuecomment-150632179 **[Test build #44232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44232/consoleFull)** for PR 9134 at commit [`5b6e651`](https://github.com/apache/spark/commit/5b6e6510dd3825910659cac95784f56c3ae9df51). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11125] [SQL] Uninformative exception wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9134#issuecomment-150630263 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11209][SPARKR] Add window functions int...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/9193#discussion_r42886920 --- Diff: R/pkg/R/functions.R --- @@ -2008,3 +2008,101 @@ setMethod("ifelse", "otherwise", no) column(jc) }) + +## Window functions## + +#' cumeDist +#' +#' Window function: returns the cumulative distribution of values within a window partition, +#' i.e. the fraction of rows that are below the current row. +#' +#' N = total number of rows in the partition +#' cumeDist(x) = number of values before (and including) x / N +#' +#' This is equivalent to the CUME_DIST function in SQL. +#' +#' @rdname cumeDist +#' @name cumeDist +#' @family window_funcs +#' @export +#' @examples \dontrun{cumeDist()} +setMethod("cumeDist", + signature(x = "missing"), + function() { +jc <- callJStatic("org.apache.spark.sql.functions", "cumeDist") +column(jc) + }) + +#' lag +#' +#' Window function: returns the value that is `offset` rows before the current row, and +#' `defaultValue` if there is less than `offset` rows before the current row. For example, +#' an `offset` of one will return the previous row at any given point in the window partition. +#' +#' This is equivalent to the LAG function in SQL. +#' +#' @rdname lag +#' @name lag +#' @family window_funcs +#' @export +#' @examples \dontrun{lag(df$c)} +setMethod("lag", --- End diff -- There is a method called `lag` in base R that this would conflict with. Could we try to use the same argument names as that function ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11125] [SQL] Uninformative exception wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9134#issuecomment-150630290 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11125] [SQL] Uninformative exception wh...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9134#issuecomment-150630038 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11125] [SQL] Uninformative exception wh...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9134#issuecomment-150630025 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9214#issuecomment-150626639 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11286 Make Outbox stopped exception sing...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9254#issuecomment-150627414 **[Test build #44230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44230/consoleFull)** for PR 9254 at commit [`79c6e00`](https://github.com/apache/spark/commit/79c6e00dae89de815de73e9ce66f47d9e0cb6bdd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9214#issuecomment-150627404 **[Test build #44231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44231/consoleFull)** for PR 9214 at commit [`89063dd`](https://github.com/apache/spark/commit/89063dd3654066e076e0e4f13250d25c414e88c3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9214#issuecomment-150626674 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9214#issuecomment-150626295 **[Test build #44229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44229/consoleFull)** for PR 9214 at commit [`4a19702`](https://github.com/apache/spark/commit/4a19702bf1de0af003c2cbc58bf2ec61c79d17b9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Make Outbox stopped exception singleton
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9254#issuecomment-150625601 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Make Outbox stopped exception singleton
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9254#issuecomment-150625571 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Make Outbox stopped exception singleton
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/9254 Make Outbox stopped exception singleton You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9254.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9254 commit 79c6e00dae89de815de73e9ce66f47d9e0cb6bdd Author: tedyu Date: 2015-10-23T16:25:15Z Make Outbox stopped exception singleton --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10562][SQL] support mixed case partitio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9251#issuecomment-150624445 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44226/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9319][SPARKR] Add support for setting c...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/9218#discussion_r42884395 --- Diff: R/pkg/R/DataFrame.R --- @@ -276,6 +276,57 @@ setMethod("names<-", } }) +#' @rdname columns +#' @name colnames +setMethod("colnames", + signature(x = "DataFrame"), + function(x) { +columns(x) + }) + +#' @rdname columns +#' @name colnames<- +setMethod("colnames<-", + signature(x = "DataFrame", value = "character"), + function(x, value) { +sdf <- callJMethod(x@sdf, "toDF", as.list(value)) +dataFrame(sdf) + }) + +#' coltypes +#' +#' Set the column types of a DataFrame. +#' +#' @name coltypes +#' @param x (DataFrame) +#' @return value (character) A character vector with the target column types for the given DataFrame +#' @rdname coltypes +#' @aliases coltypes +#' @export +#' @examples +#'\dontrun{ +#' sc <- sparkR.init() +#' sqlContext <- sparkRSQL.init(sc) +#' path <- "path/to/file.json" +#' df <- jsonFile(sqlContext, path) +#' coltypes(df) <- c("string", "integer") +#'} +setMethod("coltypes<-", --- End diff -- So this is a little tricky. In #8984 we are converting the SparkSQL types to R types. So in that case for consistency we should take in R types here (i.e character, numeric etc.) and convert them to SparkSQL types --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9214#issuecomment-150624419 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10562][SQL] support mixed case partitio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9251#issuecomment-15062 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9214#issuecomment-150624389 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10562][SQL] support mixed case partitio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9251#issuecomment-150624292 **[Test build #44226 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44226/consoleFull)** for PR 9251 at commit [`520f008`](https://github.com/apache/spark/commit/520f008138153b532f93d9144180e4ab9654d2ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11258 Converting a Spark DataFrame into ...
Github user FRosner commented on a diff in the pull request: https://github.com/apache/spark/pull/9222#discussion_r42884105 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -130,16 +130,18 @@ private[r] object SQLUtils { } def dfToCols(df: DataFrame): Array[Array[Any]] = { -// localDF is Array[Row] -val localDF = df.collect() +val localDF: Array[Row] = df.collect() val numCols = df.columns.length +val numRows = localDF.length -// result is Array[Array[Any]] -(0 until numCols).map { colIdx => - localDF.map { row => -row(colIdx) +val colArray = new Array[Array[Any]](numCols) +for (colNo <- 0 until numCols) { --- End diff -- Yeah we might give this a try but I don't think that the loop itself is the problem but rather the stuff that is going on in map and then .toArray. I will investigate a bit more over the week end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5569] [STREAMING] fix ObjectInputStream...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8955#issuecomment-150623296 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44227/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5569] [STREAMING] fix ObjectInputStream...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8955#issuecomment-150623294 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5569] [STREAMING] fix ObjectInputStream...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8955#issuecomment-150623018 **[Test build #44227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44227/consoleFull)** for PR 8955 at commit [`d2d0404`](https://github.com/apache/spark/commit/d2d0404f3b68ae3a85d3592b3536feca68e2d22b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11258 Remove quadratic runtime complexit...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/9222#discussion_r42883859 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -130,16 +130,18 @@ private[r] object SQLUtils { } def dfToCols(df: DataFrame): Array[Array[Any]] = { -// localDF is Array[Row] -val localDF = df.collect() +val localDF: Array[Row] = df.collect() val numCols = df.columns.length +val numRows = localDF.length -// result is Array[Array[Any]] -(0 until numCols).map { colIdx => - localDF.map { row => -row(colIdx) +val colArray = new Array[Array[Any]](numCols) +for (colNo <- 0 until numCols) { --- End diff -- Using a while loop here instead of a for loop should also help with performance --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Ignore NoClassDefFoundError in obtainTokenForH...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/9213#issuecomment-150622695 Should be covered by SPARK-11265 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11184] [MLLIB] Declare most of .mllib c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9169#issuecomment-150622726 **[Test build #44228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44228/consoleFull)** for PR 9169 at commit [`22c6277`](https://github.com/apache/spark/commit/22c62774b04a3f845d4253dc0412ade2b8d8c7ce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Ignore NoClassDefFoundError in obtainTokenForH...
Github user tedyu closed the pull request at: https://github.com/apache/spark/pull/9213 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10382] Make example code in user guide ...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/9109#issuecomment-150622432 @mengxr Sure I can do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11184] [MLLIB] Declare most of .mllib c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9169#issuecomment-150621772 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11184] [MLLIB] Declare most of .mllib c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9169#issuecomment-150621704 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8992] [SQL] Add pivot to dataframe api
Github user aray commented on the pull request: https://github.com/apache/spark/pull/7841#issuecomment-150620321 @rxin here is my summary of other frameworks API's I'm going to use an example dataset form the pandas doc for all the examples (as df) |A|B|C|D| |---|---|---|---| |foo|one|small|1| |foo|one|large|2| |foo|one|large|2| |foo|two|small|3| |foo|two|small|3| |bar|one|large|4| |bar|one|small|5| |bar|two|small|6| |bar|two|large|7| This API ```scala scala> df.groupBy("A", "B").pivot("C", "small", "large").sum("D").show +---+---+-+-+ | A| B|small|large| +---+---+-+-+ |foo|two|6| null| |bar|two|6|7| |foo|one|1|4| |bar|one|5|4| +---+---+-+-+ scala> df.groupBy("A", "B").pivot("C", "small", "large").agg(sum("D"), avg("D")).show +---+---+++++ | A| B|small sum(D)|small avg(D)|large sum(D)|large avg(D)| +---+---+++++ |foo|two| 6| 3.0|null|null| |bar|two| 6| 6.0| 7| 7.0| |foo|one| 1| 1.0| 4| 2.0| |bar|one| 5| 5.0| 4| 4.0| +---+---+++++ scala> df.pivot(Seq($"A", $"B"), $"C", Seq("small", "large"), sum($"D")).show +---+---+-+-+ | A| B|small|large| +---+---+-+-+ |foo|two|6| null| |bar|two|6|7| |foo|one|1|4| |bar|one|5|4| +---+---+-+-+ ``` We require a list of values for the pivot column as we are required to know the output columns of the operator ahead of time. Pandas and reshape2 do not require this but the comparable SQL operators do. We also allow multiple aggregations which not all implementations allow. pandas -- The comparable metod for pandas is `pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True)` Example ```python >>> pivot_table(df, values='D', index=['A', 'B'], columns=['C'], aggfunc=np.sum) small large foo one 1 4 two 6 NaN bar one 5 4 two 6 7 ``` Pandas also allows multiple aggregations: ```python >>> pivot_table(df, values='D', index=['A', 'B'], columns=['C'], aggfunc=[np.sum, np.average]) sum average C large small large small A B bar one 4 5 4 5 two 7 6 7 6 foo one 4 1 2 1 two NaN 6 NaN 3 ``` References - http://pandas.pydata.org/pandas-docs/stable/reshaping.html - http://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html See also: `pivot`, `stack`, `unstack`. reshape2 (R) The comparable method for reshape2 is `dcast(data, formula, fun.aggregate = NULL, ..., margins = NULL, subset = NULL, fill = NULL, drop = TRUE, value.var = guess_value(data))` ```r > dcast(df, A + B ~ C, sum) Using D as value column: use value.var to override. A B large small 1 bar one 4 5 2 bar two 7 6 3 foo one 4 1 4 foo two 0 6 ``` Note that by default cast fills with the value from applying fun.aggregate to 0 length vector References - https://cran.r-project.org/web/packages/reshape2/reshape2.pdf - http://seananderson.ca/2013/10/19/reshape.html - http://www.inside-r.org/packages/cran/reshape2/docs/cast See also: `melt`. MS SQL Server -- ```sql SELECT * FROM df pivot (sum(D) for C in ([small], [large])) p ``` http://sqlfiddle.com/#!3/cf887/3/0 References - http://sqlhints.com/2014/03/10/pivot-and-unpivot-in-sql-server/ Oracle 11g -- ```sql SELECT * FROM df pivot (sum(D) for C in ('small', 'large')) p ``` http://sqlfiddle.com/#!4/29bc5/3/0 Oracle also allows multiple aggregations and with similar output to this api ```sql SELECT * FROM df pivot (sum(D) as sum, avg(D) as avg for C in ('small', 'large')) p ``` http://sqlfiddle.com/#!4/29bc5/5/0 References - http://www.oracle.com/technetwork/articles/sql/11g-pivot-097235.html - http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_10002.htm#CHDCEJJE - http://www.techonthenet.com/oracle/pivot.php -- Let me know if I can do anything else to help this along. Als
[GitHub] spark pull request: [SPARK-11264] ./bin/spark-class can't find ass...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/9231#issuecomment-150619514 I was not even aware of `GREP_OPTIONS` until this PR. I presume this is pretty safe, since I don't see why we would need to support custom grep behavior. LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11265 hive tokens
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/9232#discussion_r42881320 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -322,8 +322,10 @@ private[spark] class Client( // multiple times, YARN will fail to launch containers for the app with an internal // error. val distributedUris = new HashSet[String] -obtainTokenForHiveMetastore(sparkConf, hadoopConf, credentials) -obtainTokenForHBase(sparkConf, hadoopConf, credentials) +if (isClusterMode) { --- End diff -- why is this cluster mode only? I can run spark shell to access hive or hbase and this won't get tokens for those to ship to executors? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix typos
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/9250#issuecomment-150615913 If it's a handful of typos I wouldn't bother with a JIRA. I think the operating theory is JIRA = what and PR = how, and if those are virtually the same there's no point in duplicating. I'd focus on docs I suppose as it is much more to be read. In fact I don't know if you could reasonably search scala source for typos because of all the false positives, but searching generated scaladoc might be reasonable. Still, sounds possibly too noisy to search. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9253#issuecomment-150614390 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...
GitHub user nitin2goyal opened a pull request: https://github.com/apache/spark/pull/9253 [SPARK-7970] Skip closure cleaning for SQL operations Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nitin2goyal/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9253.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9253 commit 4ee8058447b5e7eff242960aae6fbd56631b Author: nitin.goyal Date: 2015-10-19T06:51:42Z SPARK-11179: Push filters through aggregate if filters are subset of 'group by' attribute set commit 3b016b73c239ce9cdc85a5edb1a2127c1f67433a Author: nitin goyal Date: 2015-10-20T07:19:53Z SPARK-11179: Push filters through aggregate if filters are subset of 'group by' attribute set commit 671fbb31d7c908668526bdc146e0168ffb3014a8 Author: nitin goyal Date: 2015-10-20T10:17:41Z SPARK-11179: Push filters through aggregate if filters are subset of 'group by' attribute set commit f422aa81e10ad01762847c71e678c3b2ef85a926 Author: nitin goyal Date: 2015-10-20T18:32:47Z [SPARK-11179] [SQL] Push filters through aggregate Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions. Query plan before optimisation :- Filter ((c#138L = 2) && (a#0 = 3)) Aggregate [a#0], [a#0,count(b#1) AS c#138L] Project [a#0,b#1] LocalRelation [a#0,b#1,c#2] Query plan after optimisation :- Filter (c#138L = 2) Aggregate [a#0], [a#0,count(b#1) AS c#138L] Filter (a#0 = 3) Project [a#0,b#1] LocalRelation [a#0,b#1,c#2] commit 82fc386675ea2bcd5123d3abd83f6565669fcd69 Author: nitin goyal Date: 2015-10-21T04:39:56Z [SPARK-11179] [SQL] Push filters through aggregate Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions. Query plan before optimisation :- Filter ((c#138L = 2) && (a#0 = 3)) Aggregate [a#0], [a#0,count(b#1) AS c#138L] Project [a#0,b#1] LocalRelation [a#0,b#1,c#2] Query plan after optimisation :- Filter (c#138L = 2) Aggregate [a#0], [a#0,count(b#1) AS c#138L] Filter (a#0 = 3) Project [a#0,b#1] LocalRelation [a#0,b#1,c#2] commit 20cf7226f80707bfb6c4164effab50edbea4dce2 Author: nitin goyal Date: 2015-10-23T15:19:35Z Merge remote-tracking branch 'upstream/master' commit ca487cbae6ba4eb2d14d7b007eb54ccc4dd3ee3a Author: nitin goyal Date: 2015-10-23T15:26:33Z [SPARK-7970] Skip closure cleaning for SQL operations Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...
Github user nitin2goyal commented on the pull request: https://github.com/apache/spark/pull/9253#issuecomment-150613740 cc @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6723] [MLLIB] Model import/export for C...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6785 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6723] [MLLIB] Model import/export for C...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/6785#issuecomment-150612720 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10277][MLlib][PySpark] Add @since annot...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8684 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10277][MLlib][PySpark] Add @since annot...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8684#issuecomment-150612354 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10277][MLlib][PySpark] Add @since annot...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8684#issuecomment-150612150 @noel-smith @yu-iskw Mixing multiple features in a PR generally delays the code review. For example, we are now discussing `reStructuredText` format for comments in a PR titled "Add @since annotation to pyspark.mllib.regression". I'm going to merge this. @yu-iskw Could you make a follow-up PR to fix the syntax in the comments? Also pay attention to the line width in docstring. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42879337 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -930,3 +930,327 @@ object HyperLogLogPlusPlus { ) // scalastyle:on } + +/** + * A central moment is the expected value of a specified power of the deviation of a random + * variable from the mean. Central moments are often used to characterize the properties of about + * the shape of a distribution. + * + * This class implements online, one-pass algorithms for computing the central moments of a set of + * points. + * + * References: + * - Xiangrui Meng. "Simpler Online Updates for Arbitrary-Order Central Moments." + * 2015. http://arxiv.org/abs/1510.04923 + * + * @see [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + * Algorithms for calculating variance (Wikipedia)]] + * + * @param child to compute central moments of. + */ +abstract class CentralMomentAgg(child: Expression) extends ImperativeAggregate with Serializable { + + /** + * The maximum central moment order to be computed. + */ + protected def momentOrder: Int + + /** + * Array of sufficient moments need to compute the aggregate statistic. + */ + protected def sufficientMoments: Array[Int] + + override def children: Seq[Expression] = Seq(child) + + override def nullable: Boolean = false + + override def dataType: DataType = DoubleType + + // Expected input data type. + // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the + // new version at planning time (after analysis phase). For now, NullType is added at here + // to make it resolved when we have cases like `select avg(null)`. + // We can use our analyzer to cast NullType to the default data type of the NumericType once + // we remove the old aggregate functions. Then, we will not need NullType at here. + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType)) + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + /** + * The number of central moments to store in the buffer. + */ + private[this] val numMoments = 5 + + override val aggBufferAttributes: Seq[AttributeReference] = Seq.tabulate(numMoments) { i => +AttributeReference(s"M$i", DoubleType)() + } + + // Note: although this simply copies aggBufferAttributes, this common code can not be placed + // in the superclass because that will lead to initialization ordering issues. + override val inputAggBufferAttributes: Seq[AttributeReference] = +aggBufferAttributes.map(_.newInstance()) + + /** + * Initialize all moments to zero. + */ + override def initialize(buffer: MutableRow): Unit = { +for (aggIndex <- 0 until numMoments) { + buffer.setDouble(mutableAggBufferOffset + aggIndex, 0.0) +} + } + + // frequently used values for online updates + private[this] var delta = 0.0 + private[this] var deltaN = 0.0 + private[this] var delta2 = 0.0 + private[this] var deltaN2 = 0.0 + + /** + * Update the central moments buffer. + */ + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val v = Cast(child, DoubleType).eval(input) +if (v != null) { + val updateValue = v match { +case d: Double => d +case _ => 0.0 + } + var n = buffer.getDouble(mutableAggBufferOffset) + var mean = buffer.getDouble(mutableAggBufferOffset + 1) + var m2 = 0.0 + var m3 = 0.0 + var m4 = 0.0 + + n += 1.0 + delta = updateValue - mean + deltaN = delta / n + mean += deltaN + buffer.setDouble(mutableAggBufferOffset, n) + buffer.setDouble(mutableAggBufferOffset + 1, mean) --- End diff -- I don't think we are going to support arbitrary-order moments. Kurtosis should be sufficient:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10382] Make example code in user guide ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/9109#issuecomment-150609532 LGTM. Merged into master. Thanks! This makes the example code much easier to check. Could you make one JIRA and submit a PR to replace some example code using this? Then we can create more JIRAs, and ask community to help. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10382] Make example code in user guide ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9109 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11284] [ML] ALS produces float predicti...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/9252#issuecomment-150607991 @dahlem Any issue with float precision? The ratings do not have high precision anyway. Changing it to double precision increases the shuffle size by a lot. If you want to make the `RegressionEvaluator` work with `ALS`, you can cast the label type to Double in `RegressionEvaluator`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5569] [STREAMING] fix ObjectInputStream...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8955#issuecomment-150607120 **[Test build #44227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44227/consoleFull)** for PR 8955 at commit [`d2d0404`](https://github.com/apache/spark/commit/d2d0404f3b68ae3a85d3592b3536feca68e2d22b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix typos
Github user jaceklaskowski commented on the pull request: https://github.com/apache/spark/pull/9250#issuecomment-150606925 Ok, deal. I can run a spell-checker and see what I can fix within a half-an-hour timeframe. Should I go and create a JIRA task for it? Any particular module/package to look at during the timeframe? Thanks @srowen for the help! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5569] [STREAMING] fix ObjectInputStream...
Github user maxwellzdm commented on the pull request: https://github.com/apache/spark/pull/8955#issuecomment-150605939 @tdas I have added unit test which wouldn't pass without this patch. Please review it when you have time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5569] [STREAMING] fix ObjectInputStream...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8955#issuecomment-150605267 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5569] [STREAMING] fix ObjectInputStream...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8955#issuecomment-150605236 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11098][Core]Add Outbox to cache the sen...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/9197#discussion_r42876278 --- Diff: core/src/main/scala/org/apache/spark/rpc/netty/Outbox.scala --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.rpc.netty + +import java.util.concurrent.Callable +import javax.annotation.concurrent.GuardedBy + +import scala.util.control.NonFatal + +import org.apache.spark.SparkException +import org.apache.spark.network.client.{RpcResponseCallback, TransportClient} +import org.apache.spark.rpc.RpcAddress + +private[netty] case class OutboxMessage(content: Array[Byte], callback: RpcResponseCallback) + +private[netty] class Outbox(nettyEnv: NettyRpcEnv, val address: RpcAddress) { + + outbox => // Give this an alias so we can use it more clearly in closures. + + @GuardedBy("this") + private val messages = new java.util.LinkedList[OutboxMessage] + + @GuardedBy("this") + private var client: TransportClient = null + + /** + * connectFuture points to the connect task. If there is no connect task, connectFuture will be + * null. + */ + @GuardedBy("this") + private var connectFuture: java.util.concurrent.Future[Unit] = null + + @GuardedBy("this") + private var stopped = false + + /** + * If there is any thread draining the message queue + */ + @GuardedBy("this") + private var draining = false + + /** + * Send a message. If there is no active connection, cache it and launch a new connection. If + * [[Outbox]] is stopped, the sender will be notified with a [[SparkException]]. + */ + def send(message: OutboxMessage): Unit = { +val dropped = synchronized { + if (stopped) { +true + } else { +messages.add(message) +false + } +} +if (dropped) { + message.callback.onFailure(new SparkException("Message is dropped because Outbox is stopped")) +} else { + drainOutbox() +} + } + + /** + * Drain the message queue. If there is other draining thread, just exit. If the connection has + * not been established, launch a task in the `nettyEnv.clientConnectionExecutor` to setup the + * connection. + */ + private def drainOutbox(): Unit = { +var message: OutboxMessage = null +synchronized { + if (stopped) { +return + } + if (connectFuture != null) { +// We are connecting to the remote address, so just exit +return + } + if (client == null) { +// There is no connect task but client is null, so we need to launch the connect task. +launchConnectTask() +return + } + if (draining) { +// There is some thread draining, so just exit +return + } + message = messages.poll() + if (message == null) { +return + } + draining = true +} +while (true) { + try { +val _client = synchronized { client } +if (_client != null) { + _client.sendRpc(message.content, message.callback) +} else { + assert(stopped == true) +} + } catch { +case NonFatal(e) => + handleNetworkFailure(e) + return + } + synchronized { +if (stopped) { + return +} +message = messages.poll() +if (message == null) { + draining = false + return +} + } +} + } + + private def launchConnectTask(): Unit = { +connectFuture = nettyEnv.clientConnectionExecutor.submit(new Callable[Unit] { + + override def call(): Unit = { +try { + val _client = nettyEnv
[GitHub] spark pull request: Fix typos
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/9250#issuecomment-150594861 Disregard the failure, it's unrelated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix typos
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9250#issuecomment-150594621 **[Test build #1945 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1945/consoleFull)** for PR 9250 at commit [`7d1b20d`](https://github.com/apache/spark/commit/7d1b20d2346b42dac9268bdba6b1ef8933489a3c). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/9214#issuecomment-150592921 this isn't quite ready yet ... still working through test failures. I think the remaining changes are to the tests, but need to work through those and then some cleanup ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11086][SPARKR] Use dropFactors column-w...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9099#issuecomment-150589746 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44224/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9232#issuecomment-150589711 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44223/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11086][SPARKR] Use dropFactors column-w...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9099#issuecomment-150589742 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11086][SPARKR] Use dropFactors column-w...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9099#issuecomment-150589689 **[Test build #44224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44224/consoleFull)** for PR 9099 at commit [`36ccdc7`](https://github.com/apache/spark/commit/36ccdc702580e6dc92bb65749028396f5ade010d). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * ` # class of POSIXlt is c(\"POSIXlt\" \"POSIXt\")`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9232#issuecomment-150589707 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9232#issuecomment-150589550 **[Test build #44223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44223/consoleFull)** for PR 9232 at commit [`9630a9d`](https://github.com/apache/spark/commit/9630a9d80bc33f738dad6ebc841cb4aea058056d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10562][SQL] support mixed case partitio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9251#issuecomment-150587942 **[Test build #44226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44226/consoleFull)** for PR 9251 at commit [`520f008`](https://github.com/apache/spark/commit/520f008138153b532f93d9144180e4ab9654d2ce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11261] [Core] Provide a more flexible a...
Github user rmarsch closed the pull request at: https://github.com/apache/spark/pull/9228 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11261] [Core] Provide a more flexible a...
Github user rmarsch commented on the pull request: https://github.com/apache/spark/pull/9228#issuecomment-150586870 Alright, that's a bit disappointing but I understand. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11284] [ML] ALS produces float predicti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9252#issuecomment-150586598 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org