date:20151023

[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9253#issuecomment-150644300
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...

2015-10-23 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/9253#issuecomment-150643877
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11271][SPARK-11016][Core] Use Spark Bit...

2015-10-23 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/9243#issuecomment-150643375
  
@lemire any comment on this thread? Looks like we are having some trouble 
with the roaring bitmap.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

2015-10-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9215#discussion_r42892965
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -62,7 +67,19 @@ private[sql] trait SQLMetricValue[T] extends 
Serializable {
 private[sql] class LongSQLMetricValue(private var _value : Long) extends 
SQLMetricValue[Long] {
 
   def add(incr: Long): LongSQLMetricValue = {
-_value += incr
+// Some LongSQLMetric will use -1 as initial value, so if the 
accumulator is never updated,
+// we can filter it out later.  However, when `add` is called, the 
accumulator is valid, we
+// should turn -1 to 0.
+if (_value < 0) {
+  _value = 0
+}
+
+// Some LongSQLMetric will use -1 as initial value, when we merge 
accumulator updates at driver
+// side, we should ignore these -1 values.
+if (incr > 0) {
+  _value += incr
+}
+
--- End diff --

@cloud-fan sorry. I just realized that this method is in the critical path 
(when we calculate numRows). How about we remove this change and document it 
clear that those negative initial values will have a small impact on the sum of 
memory consumption?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9215#issuecomment-150643054
  
**[Test build #44237 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44237/consoleFull)**
 for PR 9215 at commit 
[`4ff8912`](https://github.com/apache/spark/commit/4ff891205979a06abbf229f115bbbf99bda3ba1f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9215#issuecomment-150640729
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9215#issuecomment-150640759
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-150640384
  
**[Test build #44236 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44236/consoleFull)**
 for PR 9249 at commit 
[`18d2861`](https://github.com/apache/spark/commit/18d28619264dbaf10f1e27576f5c4275cbc4ef72).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9214#issuecomment-150640491
  
**[Test build #44235 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44235/consoleFull)**
 for PR 9214 at commit 
[`4145651`](https://github.com/apache/spark/commit/4145651724eb99fb440cc8509df154d8f8095b47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-150639887
  
Thank you @stephend-realitymine for working on it! Overall, the change in 
infer schema looks good. I left a comment at the test part. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-150639689
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-150639713
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-150639251
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9249#discussion_r42890747
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala
 ---
@@ -103,11 +107,14 @@ private[sql] object InferSchema {
 // the type as we pass through all JSON objects.
 var elementType: DataType = NullType
 while (nextUntil(parser, END_ARRAY)) {
-  elementType = compatibleType(elementType, inferField(parser))
+  elementType = compatibleType(elementType, inferField(parser, 
primitivesAsString))
 }
 
 ArrayType(elementType)
 
+  case (VALUE_NUMBER_INT | VALUE_NUMBER_FLOAT) if primitivesAsString 
=> StringType
+  case (VALUE_TRUE | VALUE_FALSE) if primitivesAsString => StringType
--- End diff --

Add a newline between these two cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9214#issuecomment-150638330
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9214#issuecomment-150638306
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9249#discussion_r42890562
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -1262,4 +1299,4 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
   )
 }
   }
-}
+}
--- End diff --

Add a newline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9249#discussion_r42890545
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -632,6 +632,39 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
 )
   }
 
+  test("Loading a JSON dataset primitivesAsString returns schema with 
primitive types as strings") {
+val dir = Utils.createTempDir()
+dir.delete()
+val path = dir.getCanonicalPath
+primitiveFieldAndType.map(record => record.replaceAll("\n", " 
")).saveAsTextFile(path)
+val jsonDF = sqlContext.read.option("primitivesAsString", 
"true").json(path)
+
+val expectedSchema = StructType(
+  StructField("bigInteger", DecimalType(20, 0), true) ::
+  StructField("boolean", BooleanType, true) ::
+  StructField("double", DoubleType, true) ::
+  StructField("integer", LongType, true) ::
+  StructField("long", LongType, true) ::
+  StructField("null", StringType, true) ::
+  StructField("string", StringType, true) :: Nil)
--- End diff --

Looks like we need to change all of these data types to `StringType`, right?

Also, can you add a test with complex types (`StructType` and `ArrayType`) 
to make sure we preserve the structure?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9003#issuecomment-150637284
  
**[Test build #44234 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44234/consoleFull)**
 for PR 9003 at commit 
[`fd3f4d6`](https://github.com/apache/spark/commit/fd3f4d6f9ba5124406a7078c9e7991bf91abdad6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11194] [SQL] [BRANCH-1.5] [WIP] Use Mut...

2015-10-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9171#discussion_r42889962
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
@@ -100,7 +99,7 @@ case class AddJar(path: String) extends RunnableCommand {
 // returns the value of a thread local variable and its HiveConf may 
not be the HiveConf
 // associated with `executionHive.state` (for example, HiveContext is 
created in one thread
 // and then add jar is called from another thread).
-hiveContext.executionHive.state.getConf.setClassLoader(newClassLoader)
+
hiveContext.executionHive.state.getConf.setClassLoader(hiveContext.libraryClassLoader)
--- End diff --

I have changed the classloader to a non-closable mutable url class loader.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/9003#discussion_r42889941
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
 ---
@@ -930,3 +930,327 @@ object HyperLogLogPlusPlus {
   )
   // scalastyle:on
 }
+
+/**
+ * A central moment is the expected value of a specified power of the 
deviation of a random
+ * variable from the mean. Central moments are often used to characterize 
the properties of about
+ * the shape of a distribution.
+ *
+ * This class implements online, one-pass algorithms for computing the 
central moments of a set of
+ * points.
+ *
+ * References:
+ *  - Xiangrui Meng.  "Simpler Online Updates for Arbitrary-Order Central 
Moments."
+ *  2015. http://arxiv.org/abs/1510.04923
+ *
+ * @see [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+ * Algorithms for calculating variance (Wikipedia)]]
+ *
+ * @param child to compute central moments of.
+ */
+abstract class CentralMomentAgg(child: Expression) extends 
ImperativeAggregate with Serializable {
+
+  /**
+   * The maximum central moment order to be computed.
+   */
+  protected def momentOrder: Int
+
+  /**
+   * Array of sufficient moments need to compute the aggregate statistic.
+   */
+  protected def sufficientMoments: Array[Int]
+
+  override def children: Seq[Expression] = Seq(child)
+
+  override def nullable: Boolean = false
+
+  override def dataType: DataType = DoubleType
+
+  // Expected input data type.
+  // TODO: Right now, we replace old aggregate functions (based on 
AggregateExpression1) to the
+  // new version at planning time (after analysis phase). For now, 
NullType is added at here
+  // to make it resolved when we have cases like `select avg(null)`.
+  // We can use our analyzer to cast NullType to the default data type of 
the NumericType once
+  // we remove the old aggregate functions. Then, we will not need 
NullType at here.
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType, NullType))
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  /**
+   * The number of central moments to store in the buffer.
+   */
+  private[this] val numMoments = 5
+
+  override val aggBufferAttributes: Seq[AttributeReference] = 
Seq.tabulate(numMoments) { i =>
+AttributeReference(s"M$i", DoubleType)()
+  }
+
+  // Note: although this simply copies aggBufferAttributes, this common 
code can not be placed
+  // in the superclass because that will lead to initialization ordering 
issues.
+  override val inputAggBufferAttributes: Seq[AttributeReference] =
+aggBufferAttributes.map(_.newInstance())
+
+  /**
+   * Initialize all moments to zero.
+   */
+  override def initialize(buffer: MutableRow): Unit = {
+for (aggIndex <- 0 until numMoments) {
+  buffer.setDouble(mutableAggBufferOffset + aggIndex, 0.0)
+}
+  }
+
+  // frequently used values for online updates
+  private[this] var delta = 0.0
+  private[this] var deltaN = 0.0
+  private[this] var delta2 = 0.0
+  private[this] var deltaN2 = 0.0
+
+  /**
+   * Update the central moments buffer.
+   */
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val v = Cast(child, DoubleType).eval(input)
+if (v != null) {
+  val updateValue = v match {
+case d: Double => d
+case _ => 0.0
+  }
+  var n = buffer.getDouble(mutableAggBufferOffset)
+  var mean = buffer.getDouble(mutableAggBufferOffset + 1)
+  var m2 = 0.0
+  var m3 = 0.0
+  var m4 = 0.0
+
+  n += 1.0
+  delta = updateValue - mean
+  deltaN = delta / n
+  mean += deltaN
+  buffer.setDouble(mutableAggBufferOffset, n)
+  buffer.setDouble(mutableAggBufferOffset + 1, mean)
+
+  if (momentOrder >= 2) {
+m2 = buffer.getDouble(mutableAggBufferOffset + 2)
+m2 += delta * (delta - deltaN)
+buffer.setDouble(mutableAggBufferOffset + 2, m2)
+  }
+
+  if (momentOrder >= 3) {
+delta2 = delta * delta
+deltaN2 = deltaN * deltaN
+m3 = buffer.getDouble(mutableAggBufferOffset + 3)
+m3 += -3.0 * deltaN * m2 + delta * (delta2 - deltaN2)
+buffer.setDouble(mutableAggBufferOffset + 3, m3)
+  }
+
+  if (momentOrder >= 4) {
+m4 = buffer.getDouble(mutableAggBufferOffset + 4)
+m4 += -4.0 * deltaN * m3 - 6.0 * deltaN2 * m2 +
+  delta * (delta * d

[GitHub] spark pull request: [SPARK-11194] [SQL] [BRANCH-1.5] [WIP] Use Mut...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9171#issuecomment-150637021
  
  [Test build #44233 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44233/consoleFull)
 for   PR 9171 at commit 
[`7951df1`](https://github.com/apache/spark/commit/7951df1a91271826c8405f19d0ce12873faff21b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/9003#discussion_r42889772
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
 ---
@@ -930,3 +930,327 @@ object HyperLogLogPlusPlus {
   )
   // scalastyle:on
 }
+
+/**
+ * A central moment is the expected value of a specified power of the 
deviation of a random
+ * variable from the mean. Central moments are often used to characterize 
the properties of about
+ * the shape of a distribution.
+ *
+ * This class implements online, one-pass algorithms for computing the 
central moments of a set of
+ * points.
+ *
+ * References:
+ *  - Xiangrui Meng.  "Simpler Online Updates for Arbitrary-Order Central 
Moments."
+ *  2015. http://arxiv.org/abs/1510.04923
+ *
+ * @see [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+ * Algorithms for calculating variance (Wikipedia)]]
+ *
+ * @param child to compute central moments of.
+ */
+abstract class CentralMomentAgg(child: Expression) extends 
ImperativeAggregate with Serializable {
+
+  /**
+   * The maximum central moment order to be computed.
+   */
+  protected def momentOrder: Int
+
+  /**
+   * Array of sufficient moments need to compute the aggregate statistic.
+   */
+  protected def sufficientMoments: Array[Int]
+
+  override def children: Seq[Expression] = Seq(child)
+
+  override def nullable: Boolean = false
+
+  override def dataType: DataType = DoubleType
+
+  // Expected input data type.
+  // TODO: Right now, we replace old aggregate functions (based on 
AggregateExpression1) to the
+  // new version at planning time (after analysis phase). For now, 
NullType is added at here
+  // to make it resolved when we have cases like `select avg(null)`.
+  // We can use our analyzer to cast NullType to the default data type of 
the NumericType once
+  // we remove the old aggregate functions. Then, we will not need 
NullType at here.
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType, NullType))
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  /**
+   * The number of central moments to store in the buffer.
+   */
+  private[this] val numMoments = 5
+
+  override val aggBufferAttributes: Seq[AttributeReference] = 
Seq.tabulate(numMoments) { i =>
+AttributeReference(s"M$i", DoubleType)()
+  }
+
+  // Note: although this simply copies aggBufferAttributes, this common 
code can not be placed
+  // in the superclass because that will lead to initialization ordering 
issues.
+  override val inputAggBufferAttributes: Seq[AttributeReference] =
+aggBufferAttributes.map(_.newInstance())
+
+  /**
+   * Initialize all moments to zero.
+   */
+  override def initialize(buffer: MutableRow): Unit = {
+for (aggIndex <- 0 until numMoments) {
+  buffer.setDouble(mutableAggBufferOffset + aggIndex, 0.0)
+}
+  }
+
+  // frequently used values for online updates
+  private[this] var delta = 0.0
+  private[this] var deltaN = 0.0
+  private[this] var delta2 = 0.0
+  private[this] var deltaN2 = 0.0
+
+  /**
+   * Update the central moments buffer.
+   */
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val v = Cast(child, DoubleType).eval(input)
+if (v != null) {
+  val updateValue = v match {
+case d: Double => d
+case _ => 0.0
+  }
+  var n = buffer.getDouble(mutableAggBufferOffset)
+  var mean = buffer.getDouble(mutableAggBufferOffset + 1)
+  var m2 = 0.0
--- End diff --

Added as `private[this]` vars


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/9003#discussion_r42889712
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
 ---
@@ -930,3 +930,327 @@ object HyperLogLogPlusPlus {
   )
   // scalastyle:on
 }
+
+/**
+ * A central moment is the expected value of a specified power of the 
deviation of a random
+ * variable from the mean. Central moments are often used to characterize 
the properties of about
+ * the shape of a distribution.
+ *
+ * This class implements online, one-pass algorithms for computing the 
central moments of a set of
+ * points.
+ *
+ * References:
+ *  - Xiangrui Meng.  "Simpler Online Updates for Arbitrary-Order Central 
Moments."
+ *  2015. http://arxiv.org/abs/1510.04923
+ *
+ * @see [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+ * Algorithms for calculating variance (Wikipedia)]]
+ *
+ * @param child to compute central moments of.
+ */
+abstract class CentralMomentAgg(child: Expression) extends 
ImperativeAggregate with Serializable {
+
+  /**
+   * The maximum central moment order to be computed.
+   */
+  protected def momentOrder: Int
+
+  /**
+   * Array of sufficient moments need to compute the aggregate statistic.
+   */
+  protected def sufficientMoments: Array[Int]
+
+  override def children: Seq[Expression] = Seq(child)
+
+  override def nullable: Boolean = false
+
+  override def dataType: DataType = DoubleType
+
+  // Expected input data type.
+  // TODO: Right now, we replace old aggregate functions (based on 
AggregateExpression1) to the
+  // new version at planning time (after analysis phase). For now, 
NullType is added at here
+  // to make it resolved when we have cases like `select avg(null)`.
+  // We can use our analyzer to cast NullType to the default data type of 
the NumericType once
+  // we remove the old aggregate functions. Then, we will not need 
NullType at here.
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType, NullType))
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  /**
+   * The number of central moments to store in the buffer.
+   */
+  private[this] val numMoments = 5
+
+  override val aggBufferAttributes: Seq[AttributeReference] = 
Seq.tabulate(numMoments) { i =>
+AttributeReference(s"M$i", DoubleType)()
+  }
+
+  // Note: although this simply copies aggBufferAttributes, this common 
code can not be placed
+  // in the superclass because that will lead to initialization ordering 
issues.
+  override val inputAggBufferAttributes: Seq[AttributeReference] =
+aggBufferAttributes.map(_.newInstance())
+
+  /**
+   * Initialize all moments to zero.
+   */
+  override def initialize(buffer: MutableRow): Unit = {
+for (aggIndex <- 0 until numMoments) {
+  buffer.setDouble(mutableAggBufferOffset + aggIndex, 0.0)
+}
+  }
+
+  // frequently used values for online updates
+  private[this] var delta = 0.0
+  private[this] var deltaN = 0.0
+  private[this] var delta2 = 0.0
+  private[this] var deltaN2 = 0.0
+
+  /**
+   * Update the central moments buffer.
+   */
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val v = Cast(child, DoubleType).eval(input)
+if (v != null) {
+  val updateValue = v match {
+case d: Double => d
+case _ => 0.0
--- End diff --

Looking at the code, `Cast.eval` should return the correct type or null, so 
this extra case statements has been removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/9003#discussion_r42889596
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
 ---
@@ -930,3 +930,327 @@ object HyperLogLogPlusPlus {
   )
   // scalastyle:on
 }
+
+/**
+ * A central moment is the expected value of a specified power of the 
deviation of a random
+ * variable from the mean. Central moments are often used to characterize 
the properties of about
+ * the shape of a distribution.
+ *
+ * This class implements online, one-pass algorithms for computing the 
central moments of a set of
+ * points.
+ *
+ * References:
+ *  - Xiangrui Meng.  "Simpler Online Updates for Arbitrary-Order Central 
Moments."
+ *  2015. http://arxiv.org/abs/1510.04923
+ *
+ * @see [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+ * Algorithms for calculating variance (Wikipedia)]]
+ *
+ * @param child to compute central moments of.
+ */
+abstract class CentralMomentAgg(child: Expression) extends 
ImperativeAggregate with Serializable {
+
+  /**
+   * The maximum central moment order to be computed.
+   */
+  protected def momentOrder: Int
+
+  /**
+   * Array of sufficient moments need to compute the aggregate statistic.
+   */
+  protected def sufficientMoments: Array[Int]
+
+  override def children: Seq[Expression] = Seq(child)
+
+  override def nullable: Boolean = false
+
+  override def dataType: DataType = DoubleType
+
+  // Expected input data type.
+  // TODO: Right now, we replace old aggregate functions (based on 
AggregateExpression1) to the
+  // new version at planning time (after analysis phase). For now, 
NullType is added at here
+  // to make it resolved when we have cases like `select avg(null)`.
+  // We can use our analyzer to cast NullType to the default data type of 
the NumericType once
+  // we remove the old aggregate functions. Then, we will not need 
NullType at here.
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType, NullType))
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  /**
+   * The number of central moments to store in the buffer.
+   */
+  private[this] val numMoments = 5
+
+  override val aggBufferAttributes: Seq[AttributeReference] = 
Seq.tabulate(numMoments) { i =>
+AttributeReference(s"M$i", DoubleType)()
+  }
+
+  // Note: although this simply copies aggBufferAttributes, this common 
code can not be placed
+  // in the superclass because that will lead to initialization ordering 
issues.
+  override val inputAggBufferAttributes: Seq[AttributeReference] =
+aggBufferAttributes.map(_.newInstance())
+
+  /**
+   * Initialize all moments to zero.
+   */
+  override def initialize(buffer: MutableRow): Unit = {
+for (aggIndex <- 0 until numMoments) {
+  buffer.setDouble(mutableAggBufferOffset + aggIndex, 0.0)
+}
+  }
+
+  // frequently used values for online updates
+  private[this] var delta = 0.0
--- End diff --

done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/9003#discussion_r42889558
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
 ---
@@ -930,3 +930,327 @@ object HyperLogLogPlusPlus {
   )
   // scalastyle:on
 }
+
+/**
+ * A central moment is the expected value of a specified power of the 
deviation of a random
+ * variable from the mean. Central moments are often used to characterize 
the properties of about
+ * the shape of a distribution.
+ *
+ * This class implements online, one-pass algorithms for computing the 
central moments of a set of
+ * points.
+ *
+ * References:
+ *  - Xiangrui Meng.  "Simpler Online Updates for Arbitrary-Order Central 
Moments."
+ *  2015. http://arxiv.org/abs/1510.04923
+ *
+ * @see [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+ * Algorithms for calculating variance (Wikipedia)]]
+ *
+ * @param child to compute central moments of.
+ */
+abstract class CentralMomentAgg(child: Expression) extends 
ImperativeAggregate with Serializable {
+
+  /**
+   * The maximum central moment order to be computed.
+   */
+  protected def momentOrder: Int
+
+  /**
+   * Array of sufficient moments need to compute the aggregate statistic.
+   */
+  protected def sufficientMoments: Array[Int]
--- End diff --

Removed this def and instead pass all moments up to the maximum moment to 
the `getStatistic` function. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11194] [SQL] [BRANCH-1.5] [WIP] Use Mut...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9171#issuecomment-150636241
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11194] [SQL] [BRANCH-1.5] [WIP] Use Mut...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9171#issuecomment-150636270
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9003#issuecomment-150636269
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9003#issuecomment-150636243
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11274][SQL] Text data source support fo...

2015-10-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9240#discussion_r42889239
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/DefaultSource.scala
 ---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.text
+
+import com.google.common.base.Objects
+import org.apache.hadoop.fs.{Path, FileStatus}
+import org.apache.hadoop.io.{NullWritable, Text, LongWritable}
+import org.apache.hadoop.mapred.{TextInputFormat, JobConf}
+import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
+import org.apache.hadoop.mapreduce.{RecordWriter, TaskAttemptContext, Job}
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.mapred.SparkHadoopMapRedUtil
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
+import org.apache.spark.sql.{AnalysisException, Row, SQLContext}
+import org.apache.spark.sql.execution.datasources.PartitionSpec
+import org.apache.spark.sql.sources._
+import org.apache.spark.sql.types.{StringType, StructType}
+import org.apache.spark.unsafe.types.UTF8String
+
+/**
+ * A data source for reading text files.
+ */
+class DefaultSource extends HadoopFsRelationProvider with 
DataSourceRegister {
+
+  override def createRelation(
+  sqlContext: SQLContext,
+  paths: Array[String],
+  dataSchema: Option[StructType],
+  partitionColumns: Option[StructType],
+  parameters: Map[String, String]): HadoopFsRelation = {
+dataSchema.foreach(verifySchema)
+new TextRelation(None, partitionColumns, paths)(sqlContext)
+  }
+
+  override def shortName(): String = "text"
+
+  private def verifySchema(schema: StructType): Unit = {
+if (schema.size != 1) {
+  throw new AnalysisException(
+s"Text data source supports only a single column, and you have 
${schema.size} columns.")
+}
+val tpe = schema(0).dataType
+if (tpe != StringType) {
+  throw new AnalysisException(
+s"Text data source supports only a string column, but you have 
${tpe.simpleString}.")
+}
+  }
+}
+
+private[sql] class TextRelation(
+val maybePartitionSpec: Option[PartitionSpec],
+override val userDefinedPartitionColumns: Option[StructType],
+override val paths: Array[String] = Array.empty[String])
+(@transient val sqlContext: SQLContext)
+  extends HadoopFsRelation(maybePartitionSpec) {
+
+  /** Data schema is always a single column, named "text". */
+  override def dataSchema: StructType = new StructType().add("text", 
StringType)
+
+  /** This is an internal data source that outputs internal row format. */
+  override val needConversion: Boolean = false
+
+  /** Read path. */
+  override def buildScan(inputPaths: Array[FileStatus]): RDD[Row] = {
+val job = new Job(sqlContext.sparkContext.hadoopConfiguration)
+val conf = SparkHadoopUtil.get.getConfigurationFromJobContext(job)
+val paths = inputPaths.map(_.getPath).sorted
--- End diff --

I got a 
```
[error] 
/Users/yhuai/Projects/Spark/yin-spark-1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/DefaultSource.scala:86:
 No implicit Ordering defined for org.apache.hadoop.fs.Path.
[error] val paths = inputPaths.map(_.getPath).sorted
[error]   ^
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---

[GitHub] spark pull request: [SPARK-11184] [MLLIB] Declare most of .mllib c...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9169#issuecomment-150632553
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11184] [MLLIB] Declare most of .mllib c...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9169#issuecomment-150632406
  
**[Test build #44228 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44228/consoleFull)**
 for PR 9169 at commit 
[`22c6277`](https://github.com/apache/spark/commit/22c62774b04a3f845d4253dc0412ade2b8d8c7ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11184] [MLLIB] Declare most of .mllib c...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9169#issuecomment-150632554
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44228/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11125] [SQL] Uninformative exception wh...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9134#issuecomment-150632179
  
**[Test build #44232 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44232/consoleFull)**
 for PR 9134 at commit 
[`5b6e651`](https://github.com/apache/spark/commit/5b6e6510dd3825910659cac95784f56c3ae9df51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11125] [SQL] Uninformative exception wh...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9134#issuecomment-150630263
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11209][SPARKR] Add window functions int...

2015-10-23 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/9193#discussion_r42886920
  
--- Diff: R/pkg/R/functions.R ---
@@ -2008,3 +2008,101 @@ setMethod("ifelse",
 "otherwise", no)
   column(jc)
   })
+
+## Window functions##
+
+#' cumeDist
+#'
+#' Window function: returns the cumulative distribution of values within a 
window partition,
+#' i.e. the fraction of rows that are below the current row.
+#' 
+#'   N = total number of rows in the partition
+#'   cumeDist(x) = number of values before (and including) x / N
+#'   
+#' This is equivalent to the CUME_DIST function in SQL.
+#'
+#' @rdname cumeDist
+#' @name cumeDist
+#' @family window_funcs
+#' @export
+#' @examples \dontrun{cumeDist()}
+setMethod("cumeDist",
+  signature(x = "missing"),
+  function() {
+jc <- callJStatic("org.apache.spark.sql.functions", "cumeDist")
+column(jc)
+  })
+
+#' lag
+#'
+#' Window function: returns the value that is `offset` rows before the 
current row, and
+#' `defaultValue` if there is less than `offset` rows before the current 
row. For example,
+#' an `offset` of one will return the previous row at any given point in 
the window partition.
+#' 
+#' This is equivalent to the LAG function in SQL.
+#'
+#' @rdname lag
+#' @name lag
+#' @family window_funcs
+#' @export
+#' @examples \dontrun{lag(df$c)}
+setMethod("lag",
--- End diff --

There is a method called `lag` in base R that this would conflict with. 
Could we try to use the same argument names as that function ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11125] [SQL] Uninformative exception wh...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9134#issuecomment-150630290
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11125] [SQL] Uninformative exception wh...

2015-10-23 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9134#issuecomment-150630038
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11125] [SQL] Uninformative exception wh...

2015-10-23 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9134#issuecomment-150630025
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9214#issuecomment-150626639
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-11286 Make Outbox stopped exception sing...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9254#issuecomment-150627414
  
**[Test build #44230 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44230/consoleFull)**
 for PR 9254 at commit 
[`79c6e00`](https://github.com/apache/spark/commit/79c6e00dae89de815de73e9ce66f47d9e0cb6bdd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9214#issuecomment-150627404
  
**[Test build #44231 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44231/consoleFull)**
 for PR 9214 at commit 
[`89063dd`](https://github.com/apache/spark/commit/89063dd3654066e076e0e4f13250d25c414e88c3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9214#issuecomment-150626674
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9214#issuecomment-150626295
  
**[Test build #44229 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44229/consoleFull)**
 for PR 9214 at commit 
[`4a19702`](https://github.com/apache/spark/commit/4a19702bf1de0af003c2cbc58bf2ec61c79d17b9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Make Outbox stopped exception singleton

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9254#issuecomment-150625601
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Make Outbox stopped exception singleton

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9254#issuecomment-150625571
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Make Outbox stopped exception singleton

2015-10-23 Thread tedyu

GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/9254

Make Outbox stopped exception singleton



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9254.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9254


commit 79c6e00dae89de815de73e9ce66f47d9e0cb6bdd
Author: tedyu 
Date:   2015-10-23T16:25:15Z

Make Outbox stopped exception singleton




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10562][SQL] support mixed case partitio...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9251#issuecomment-150624445
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44226/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9319][SPARKR] Add support for setting c...

2015-10-23 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/9218#discussion_r42884395
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -276,6 +276,57 @@ setMethod("names<-",
 }
   })
 
+#' @rdname columns
+#' @name colnames
+setMethod("colnames",
+  signature(x = "DataFrame"),
+  function(x) {
+columns(x)
+  })
+
+#' @rdname columns
+#' @name colnames<-
+setMethod("colnames<-",
+  signature(x = "DataFrame", value = "character"),
+  function(x, value) {
+sdf <- callJMethod(x@sdf, "toDF", as.list(value))
+dataFrame(sdf)
+  })
+
+#' coltypes
+#'
+#' Set the column types of a DataFrame.
+#'
+#' @name coltypes
+#' @param x (DataFrame)
+#' @return value (character) A character vector with the target column 
types for the given DataFrame
+#' @rdname coltypes
+#' @aliases coltypes
+#' @export
+#' @examples
+#'\dontrun{
+#' sc <- sparkR.init()
+#' sqlContext <- sparkRSQL.init(sc)
+#' path <- "path/to/file.json"
+#' df <- jsonFile(sqlContext, path)
+#' coltypes(df) <- c("string", "integer")
+#'}
+setMethod("coltypes<-",
--- End diff --

So this is a little tricky. In #8984 we are converting the SparkSQL types 
to R types. So in that case for consistency we should take in R types here (i.e 
character, numeric etc.) and convert them to SparkSQL types


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9214#issuecomment-150624419
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10562][SQL] support mixed case partitio...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9251#issuecomment-15062
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9214#issuecomment-150624389
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10562][SQL] support mixed case partitio...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9251#issuecomment-150624292
  
**[Test build #44226 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44226/consoleFull)**
 for PR 9251 at commit 
[`520f008`](https://github.com/apache/spark/commit/520f008138153b532f93d9144180e4ab9654d2ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-11258 Converting a Spark DataFrame into ...

2015-10-23 Thread FRosner

Github user FRosner commented on a diff in the pull request:

https://github.com/apache/spark/pull/9222#discussion_r42884105
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 
---
@@ -130,16 +130,18 @@ private[r] object SQLUtils {
   }
 
   def dfToCols(df: DataFrame): Array[Array[Any]] = {
-// localDF is Array[Row]
-val localDF = df.collect()
+val localDF: Array[Row] = df.collect()
 val numCols = df.columns.length
+val numRows = localDF.length
 
-// result is Array[Array[Any]]
-(0 until numCols).map { colIdx =>
-  localDF.map { row =>
-row(colIdx)
+val colArray = new Array[Array[Any]](numCols)
+for (colNo <- 0 until numCols) {
--- End diff --

Yeah we might give this a try but I don't think that the loop itself is the 
problem but rather the stuff that is going on in map and then .toArray. I will 
investigate a bit more over the week end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5569] [STREAMING] fix ObjectInputStream...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8955#issuecomment-150623296
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44227/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5569] [STREAMING] fix ObjectInputStream...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8955#issuecomment-150623294
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5569] [STREAMING] fix ObjectInputStream...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8955#issuecomment-150623018
  
**[Test build #44227 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44227/consoleFull)**
 for PR 8955 at commit 
[`d2d0404`](https://github.com/apache/spark/commit/d2d0404f3b68ae3a85d3592b3536feca68e2d22b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-11258 Remove quadratic runtime complexit...

2015-10-23 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/9222#discussion_r42883859
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 
---
@@ -130,16 +130,18 @@ private[r] object SQLUtils {
   }
 
   def dfToCols(df: DataFrame): Array[Array[Any]] = {
-// localDF is Array[Row]
-val localDF = df.collect()
+val localDF: Array[Row] = df.collect()
 val numCols = df.columns.length
+val numRows = localDF.length
 
-// result is Array[Array[Any]]
-(0 until numCols).map { colIdx =>
-  localDF.map { row =>
-row(colIdx)
+val colArray = new Array[Array[Any]](numCols)
+for (colNo <- 0 until numCols) {
--- End diff --

Using a while loop here instead of a for loop should also help with 
performance


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Ignore NoClassDefFoundError in obtainTokenForH...

2015-10-23 Thread tedyu

Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/9213#issuecomment-150622695
  
Should be covered by SPARK-11265


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11184] [MLLIB] Declare most of .mllib c...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9169#issuecomment-150622726
  
**[Test build #44228 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44228/consoleFull)**
 for PR 9169 at commit 
[`22c6277`](https://github.com/apache/spark/commit/22c62774b04a3f845d4253dc0412ade2b8d8c7ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Ignore NoClassDefFoundError in obtainTokenForH...

2015-10-23 Thread tedyu

Github user tedyu closed the pull request at:

https://github.com/apache/spark/pull/9213


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10382] Make example code in user guide ...

2015-10-23 Thread yinxusen

Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/9109#issuecomment-150622432
  
@mengxr Sure I can do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11184] [MLLIB] Declare most of .mllib c...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9169#issuecomment-150621772
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11184] [MLLIB] Declare most of .mllib c...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9169#issuecomment-150621704
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8992] [SQL] Add pivot to dataframe api

2015-10-23 Thread aray

Github user aray commented on the pull request:

https://github.com/apache/spark/pull/7841#issuecomment-150620321
  
@rxin here is my summary of other frameworks API's

I'm going to use an example dataset form the pandas doc for all the 
examples (as df)

|A|B|C|D|
|---|---|---|---|
|foo|one|small|1|
|foo|one|large|2|
|foo|one|large|2|
|foo|two|small|3|
|foo|two|small|3|
|bar|one|large|4|
|bar|one|small|5|
|bar|two|small|6|
|bar|two|large|7|

This API


```scala
scala> df.groupBy("A", "B").pivot("C", "small", "large").sum("D").show
+---+---+-+-+
|  A|  B|small|large|
+---+---+-+-+
|foo|two|6| null|
|bar|two|6|7|
|foo|one|1|4|
|bar|one|5|4|
+---+---+-+-+

scala> df.groupBy("A", "B").pivot("C", "small", "large").agg(sum("D"), 
avg("D")).show
+---+---+++++
|  A|  B|small sum(D)|small avg(D)|large sum(D)|large avg(D)|
+---+---+++++
|foo|two|   6| 3.0|null|null|
|bar|two|   6| 6.0|   7| 7.0|
|foo|one|   1| 1.0|   4| 2.0|
|bar|one|   5| 5.0|   4| 4.0|
+---+---+++++

scala> df.pivot(Seq($"A", $"B"), $"C", Seq("small", "large"), 
sum($"D")).show
+---+---+-+-+
|  A|  B|small|large|
+---+---+-+-+
|foo|two|6| null|
|bar|two|6|7|
|foo|one|1|4|
|bar|one|5|4|
+---+---+-+-+
```

We require a list of values for the pivot column as we are required to know 
the output columns of the operator ahead of time. Pandas and reshape2 do not 
require this but the comparable SQL operators do. We also allow multiple 
aggregations which not all implementations allow.

pandas
--

The comparable metod for pandas is `pivot_table(data, values=None, 
index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, 
dropna=True)`

Example

```python
>>> pivot_table(df, values='D', index=['A', 'B'], columns=['C'], 
aggfunc=np.sum)
  small  large
foo  one  1  4
 two  6  NaN
bar  one  5  4
 two  6  7
```

Pandas also allows multiple aggregations:

```python
>>> pivot_table(df, values='D', index=['A', 'B'], columns=['C'], 
aggfunc=[np.sum, np.average])
  sum   average  
C   large small   large small
A   B
bar one 4 5   4 5
two 7 6   7 6
foo one 4 1   2 1
two   NaN 6 NaN 3
```

References

- http://pandas.pydata.org/pandas-docs/stable/reshaping.html
- 
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html

See also: `pivot`, `stack`, `unstack`.

reshape2 (R)

The comparable method for reshape2 is `dcast(data, formula, fun.aggregate = 
NULL, ..., margins = NULL, subset = NULL, fill = NULL, drop = TRUE, value.var = 
guess_value(data))`

```r
> dcast(df, A + B ~ C, sum)
Using D as value column: use value.var to override.
A   B large small
1 bar one 4 5
2 bar two 7 6
3 foo one 4 1
4 foo two 0 6
```

Note that by default cast fills with the value from applying fun.aggregate 
to 0 length vector

References

- https://cran.r-project.org/web/packages/reshape2/reshape2.pdf
- http://seananderson.ca/2013/10/19/reshape.html
- http://www.inside-r.org/packages/cran/reshape2/docs/cast

See also: `melt`.

MS SQL Server
--

```sql
SELECT *
FROM df
pivot (sum(D) for C in ([small], [large])) p
```

http://sqlfiddle.com/#!3/cf887/3/0

References

- http://sqlhints.com/2014/03/10/pivot-and-unpivot-in-sql-server/


Oracle 11g
--

```sql
SELECT *
FROM df
pivot (sum(D) for C in ('small', 'large')) p
```
http://sqlfiddle.com/#!4/29bc5/3/0

Oracle also allows multiple aggregations and with similar output to this api

```sql
SELECT *
FROM df
pivot (sum(D) as sum, avg(D) as avg for C in ('small', 'large')) p
```
http://sqlfiddle.com/#!4/29bc5/5/0

References

- http://www.oracle.com/technetwork/articles/sql/11g-pivot-097235.html
- 
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_10002.htm#CHDCEJJE
- http://www.techonthenet.com/oracle/pivot.php

--

Let me know if I can do anything else to help this along. Als

[GitHub] spark pull request: [SPARK-11264] ./bin/spark-class can't find ass...

2015-10-23 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9231#issuecomment-150619514
  
I was not even aware of `GREP_OPTIONS` until this PR. I presume this is 
pretty safe, since I don't see why we would need to support custom grep 
behavior. LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-11265 hive tokens

2015-10-23 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/9232#discussion_r42881320
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -322,8 +322,10 @@ private[spark] class Client(
 // multiple times, YARN will fail to launch containers for the app 
with an internal
 // error.
 val distributedUris = new HashSet[String]
-obtainTokenForHiveMetastore(sparkConf, hadoopConf, credentials)
-obtainTokenForHBase(sparkConf, hadoopConf, credentials)
+if (isClusterMode) {
--- End diff --

why is this cluster mode only?  I can run spark shell to access hive or 
hbase and this won't get tokens for those to ship to executors?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix typos

2015-10-23 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9250#issuecomment-150615913
  
If it's a handful of typos I wouldn't bother with a JIRA. I think the 
operating theory is JIRA = what and PR = how, and if those are virtually the 
same there's no point in duplicating. 

I'd focus on docs I suppose as it is much more to be read. In fact I don't 
know if you could reasonably search scala source for typos because of all the 
false positives, but searching generated scaladoc might be reasonable. Still, 
sounds possibly too noisy to search.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9253#issuecomment-150614390
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...

2015-10-23 Thread nitin2goyal

GitHub user nitin2goyal opened a pull request:

https://github.com/apache/spark/pull/9253

[SPARK-7970] Skip closure cleaning for SQL operations

Also introduces new spark private API in RDD.scala with name 
'mapPartitionsInternal' which doesn't closure cleans the RDD elements.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nitin2goyal/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9253.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9253


commit 4ee8058447b5e7eff242960aae6fbd56631b
Author: nitin.goyal 
Date:   2015-10-19T06:51:42Z

SPARK-11179: Push filters through aggregate if filters are subset of 'group 
by' attribute set

commit 3b016b73c239ce9cdc85a5edb1a2127c1f67433a
Author: nitin goyal 
Date:   2015-10-20T07:19:53Z

SPARK-11179: Push filters through aggregate if filters are subset of 'group 
by' attribute set

commit 671fbb31d7c908668526bdc146e0168ffb3014a8
Author: nitin goyal 
Date:   2015-10-20T10:17:41Z

SPARK-11179: Push filters through aggregate if filters are subset of 'group 
by' attribute set

commit f422aa81e10ad01762847c71e678c3b2ef85a926
Author: nitin goyal 
Date:   2015-10-20T18:32:47Z

[SPARK-11179] [SQL] Push filters through aggregate

Push conjunctive predicates though Aggregate operators when their 
references are a subset of the groupingExpressions.

Query plan before optimisation :-
Filter ((c#138L = 2) && (a#0 = 3))
 Aggregate [a#0], [a#0,count(b#1) AS c#138L]
  Project [a#0,b#1]
   LocalRelation [a#0,b#1,c#2]

Query plan after optimisation :-
Filter (c#138L = 2)
 Aggregate [a#0], [a#0,count(b#1) AS c#138L]
  Filter (a#0 = 3)
   Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]

commit 82fc386675ea2bcd5123d3abd83f6565669fcd69
Author: nitin goyal 
Date:   2015-10-21T04:39:56Z

[SPARK-11179] [SQL] Push filters through aggregate

Push conjunctive predicates though Aggregate operators when their 
references are a subset of the groupingExpressions.

Query plan before optimisation :-
Filter ((c#138L = 2) && (a#0 = 3))
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]

Query plan after optimisation :-
Filter (c#138L = 2)
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Filter (a#0 = 3)
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]

commit 20cf7226f80707bfb6c4164effab50edbea4dce2
Author: nitin goyal 
Date:   2015-10-23T15:19:35Z

Merge remote-tracking branch 'upstream/master'

commit ca487cbae6ba4eb2d14d7b007eb54ccc4dd3ee3a
Author: nitin goyal 
Date:   2015-10-23T15:26:33Z

[SPARK-7970] Skip closure cleaning for SQL operations

Also introduces new spark private API in RDD.scala with name 
'mapPartitionsInternal' which doesn't closure cleans the RDD elements.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...

2015-10-23 Thread nitin2goyal

Github user nitin2goyal commented on the pull request:

https://github.com/apache/spark/pull/9253#issuecomment-150613740
  
cc @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6723] [MLLIB] Model import/export for C...

2015-10-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/6785


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6723] [MLLIB] Model import/export for C...

2015-10-23 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/6785#issuecomment-150612720
  
Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10277][MLlib][PySpark] Add @since annot...

2015-10-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8684


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10277][MLlib][PySpark] Add @since annot...

2015-10-23 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8684#issuecomment-150612354
  
Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10277][MLlib][PySpark] Add @since annot...

2015-10-23 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8684#issuecomment-150612150
  
@noel-smith @yu-iskw Mixing multiple features in a PR generally delays the 
code review. For example, we are now discussing `reStructuredText` format for 
comments in a PR titled "Add @since annotation to pyspark.mllib.regression". 
I'm going to merge this. @yu-iskw Could you make a follow-up PR to fix the 
syntax in the comments? Also pay attention to the line width in docstring.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-23 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/9003#discussion_r42879337
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
 ---
@@ -930,3 +930,327 @@ object HyperLogLogPlusPlus {
   )
   // scalastyle:on
 }
+
+/**
+ * A central moment is the expected value of a specified power of the 
deviation of a random
+ * variable from the mean. Central moments are often used to characterize 
the properties of about
+ * the shape of a distribution.
+ *
+ * This class implements online, one-pass algorithms for computing the 
central moments of a set of
+ * points.
+ *
+ * References:
+ *  - Xiangrui Meng.  "Simpler Online Updates for Arbitrary-Order Central 
Moments."
+ *  2015. http://arxiv.org/abs/1510.04923
+ *
+ * @see [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+ * Algorithms for calculating variance (Wikipedia)]]
+ *
+ * @param child to compute central moments of.
+ */
+abstract class CentralMomentAgg(child: Expression) extends 
ImperativeAggregate with Serializable {
+
+  /**
+   * The maximum central moment order to be computed.
+   */
+  protected def momentOrder: Int
+
+  /**
+   * Array of sufficient moments need to compute the aggregate statistic.
+   */
+  protected def sufficientMoments: Array[Int]
+
+  override def children: Seq[Expression] = Seq(child)
+
+  override def nullable: Boolean = false
+
+  override def dataType: DataType = DoubleType
+
+  // Expected input data type.
+  // TODO: Right now, we replace old aggregate functions (based on 
AggregateExpression1) to the
+  // new version at planning time (after analysis phase). For now, 
NullType is added at here
+  // to make it resolved when we have cases like `select avg(null)`.
+  // We can use our analyzer to cast NullType to the default data type of 
the NumericType once
+  // we remove the old aggregate functions. Then, we will not need 
NullType at here.
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType, NullType))
+
+  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  /**
+   * The number of central moments to store in the buffer.
+   */
+  private[this] val numMoments = 5
+
+  override val aggBufferAttributes: Seq[AttributeReference] = 
Seq.tabulate(numMoments) { i =>
+AttributeReference(s"M$i", DoubleType)()
+  }
+
+  // Note: although this simply copies aggBufferAttributes, this common 
code can not be placed
+  // in the superclass because that will lead to initialization ordering 
issues.
+  override val inputAggBufferAttributes: Seq[AttributeReference] =
+aggBufferAttributes.map(_.newInstance())
+
+  /**
+   * Initialize all moments to zero.
+   */
+  override def initialize(buffer: MutableRow): Unit = {
+for (aggIndex <- 0 until numMoments) {
+  buffer.setDouble(mutableAggBufferOffset + aggIndex, 0.0)
+}
+  }
+
+  // frequently used values for online updates
+  private[this] var delta = 0.0
+  private[this] var deltaN = 0.0
+  private[this] var delta2 = 0.0
+  private[this] var deltaN2 = 0.0
+
+  /**
+   * Update the central moments buffer.
+   */
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val v = Cast(child, DoubleType).eval(input)
+if (v != null) {
+  val updateValue = v match {
+case d: Double => d
+case _ => 0.0
+  }
+  var n = buffer.getDouble(mutableAggBufferOffset)
+  var mean = buffer.getDouble(mutableAggBufferOffset + 1)
+  var m2 = 0.0
+  var m3 = 0.0
+  var m4 = 0.0
+
+  n += 1.0
+  delta = updateValue - mean
+  deltaN = delta / n
+  mean += deltaN
+  buffer.setDouble(mutableAggBufferOffset, n)
+  buffer.setDouble(mutableAggBufferOffset + 1, mean)
--- End diff --

I don't think we are going to support arbitrary-order moments. Kurtosis 
should be sufficient:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10382] Make example code in user guide ...

2015-10-23 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/9109#issuecomment-150609532
  
LGTM. Merged into master. Thanks! This makes the example code much easier 
to check. Could you make one JIRA and submit a PR to replace some example code 
using this? Then we can create more JIRAs, and ask community to help.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10382] Make example code in user guide ...

2015-10-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9109


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11284] [ML] ALS produces float predicti...

2015-10-23 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/9252#issuecomment-150607991
  
@dahlem Any issue with float precision? The ratings do not have high 
precision anyway. Changing it to double precision increases the shuffle size by 
a lot. If you want to make the `RegressionEvaluator` work with `ALS`, you can 
cast the label type to Double in `RegressionEvaluator`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5569] [STREAMING] fix ObjectInputStream...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8955#issuecomment-150607120
  
**[Test build #44227 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44227/consoleFull)**
 for PR 8955 at commit 
[`d2d0404`](https://github.com/apache/spark/commit/d2d0404f3b68ae3a85d3592b3536feca68e2d22b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix typos

2015-10-23 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/9250#issuecomment-150606925
  
Ok, deal. I can run a spell-checker and see what I can fix within a 
half-an-hour timeframe. Should I go and create a JIRA task for it? Any 
particular module/package to look at during the timeframe?

Thanks @srowen for the help!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5569] [STREAMING] fix ObjectInputStream...

2015-10-23 Thread maxwellzdm

Github user maxwellzdm commented on the pull request:

https://github.com/apache/spark/pull/8955#issuecomment-150605939
  
@tdas I have added unit test which wouldn't pass without this patch. Please 
review it when you have time. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5569] [STREAMING] fix ObjectInputStream...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8955#issuecomment-150605267
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5569] [STREAMING] fix ObjectInputStream...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8955#issuecomment-150605236
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11098][Core]Add Outbox to cache the sen...

2015-10-23 Thread tedyu

Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/9197#discussion_r42876278
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/netty/Outbox.scala ---
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rpc.netty
+
+import java.util.concurrent.Callable
+import javax.annotation.concurrent.GuardedBy
+
+import scala.util.control.NonFatal
+
+import org.apache.spark.SparkException
+import org.apache.spark.network.client.{RpcResponseCallback, 
TransportClient}
+import org.apache.spark.rpc.RpcAddress
+
+private[netty] case class OutboxMessage(content: Array[Byte], callback: 
RpcResponseCallback)
+
+private[netty] class Outbox(nettyEnv: NettyRpcEnv, val address: 
RpcAddress) {
+
+  outbox => // Give this an alias so we can use it more clearly in 
closures.
+
+  @GuardedBy("this")
+  private val messages = new java.util.LinkedList[OutboxMessage]
+
+  @GuardedBy("this")
+  private var client: TransportClient = null
+
+  /**
+   * connectFuture points to the connect task. If there is no connect 
task, connectFuture will be
+   * null.
+   */
+  @GuardedBy("this")
+  private var connectFuture: java.util.concurrent.Future[Unit] = null
+
+  @GuardedBy("this")
+  private var stopped = false
+
+  /**
+   * If there is any thread draining the message queue
+   */
+  @GuardedBy("this")
+  private var draining = false
+
+  /**
+   * Send a message. If there is no active connection, cache it and launch 
a new connection. If
+   * [[Outbox]] is stopped, the sender will be notified with a 
[[SparkException]].
+   */
+  def send(message: OutboxMessage): Unit = {
+val dropped = synchronized {
+  if (stopped) {
+true
+  } else {
+messages.add(message)
+false
+  }
+}
+if (dropped) {
+  message.callback.onFailure(new SparkException("Message is dropped 
because Outbox is stopped"))
+} else {
+  drainOutbox()
+}
+  }
+
+  /**
+   * Drain the message queue. If there is other draining thread, just 
exit. If the connection has
+   * not been established, launch a task in the 
`nettyEnv.clientConnectionExecutor` to setup the
+   * connection.
+   */
+  private def drainOutbox(): Unit = {
+var message: OutboxMessage = null
+synchronized {
+  if (stopped) {
+return
+  }
+  if (connectFuture != null) {
+// We are connecting to the remote address, so just exit
+return
+  }
+  if (client == null) {
+// There is no connect task but client is null, so we need to 
launch the connect task.
+launchConnectTask()
+return
+  }
+  if (draining) {
+// There is some thread draining, so just exit
+return
+  }
+  message = messages.poll()
+  if (message == null) {
+return
+  }
+  draining = true
+}
+while (true) {
+  try {
+val _client = synchronized { client }
+if (_client != null) {
+  _client.sendRpc(message.content, message.callback)
+} else {
+  assert(stopped == true)
+}
+  } catch {
+case NonFatal(e) =>
+  handleNetworkFailure(e)
+  return
+  }
+  synchronized {
+if (stopped) {
+  return
+}
+message = messages.poll()
+if (message == null) {
+  draining = false
+  return
+}
+  }
+}
+  }
+
+  private def launchConnectTask(): Unit = {
+connectFuture = nettyEnv.clientConnectionExecutor.submit(new 
Callable[Unit] {
+
+  override def call(): Unit = {
+try {
+  val _client = nettyEnv

[GitHub] spark pull request: Fix typos

2015-10-23 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9250#issuecomment-150594861
  
Disregard the failure, it's unrelated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix typos

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9250#issuecomment-150594621
  
**[Test build #1945 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1945/consoleFull)**
 for PR 9250 at commit 
[`7d1b20d`](https://github.com/apache/spark/commit/7d1b20d2346b42dac9268bdba6b1ef8933489a3c).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8029][core][wip] first successful shuff...

2015-10-23 Thread squito

Github user squito commented on the pull request:

https://github.com/apache/spark/pull/9214#issuecomment-150592921
  
this isn't quite ready yet ... still working through test failures.  I 
think the remaining changes are to the tests, but need to work through those 
and then some cleanup ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11086][SPARKR] Use dropFactors column-w...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9099#issuecomment-150589746
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44224/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9232#issuecomment-150589711
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44223/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11086][SPARKR] Use dropFactors column-w...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9099#issuecomment-150589742
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11086][SPARKR] Use dropFactors column-w...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9099#issuecomment-150589689
  
**[Test build #44224 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44224/consoleFull)**
 for PR 9099 at commit 
[`36ccdc7`](https://github.com/apache/spark/commit/36ccdc702580e6dc92bb65749028396f5ade010d).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `  # 
class of POSIXlt is c(\"POSIXlt\" \"POSIXt\")`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9232#issuecomment-150589707
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9232#issuecomment-150589550
  
**[Test build #44223 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44223/consoleFull)**
 for PR 9232 at commit 
[`9630a9d`](https://github.com/apache/spark/commit/9630a9d80bc33f738dad6ebc841cb4aea058056d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10562][SQL] support mixed case partitio...

2015-10-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9251#issuecomment-150587942
  
**[Test build #44226 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44226/consoleFull)**
 for PR 9251 at commit 
[`520f008`](https://github.com/apache/spark/commit/520f008138153b532f93d9144180e4ab9654d2ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11261] [Core] Provide a more flexible a...

2015-10-23 Thread rmarsch

Github user rmarsch closed the pull request at:

https://github.com/apache/spark/pull/9228


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11261] [Core] Provide a more flexible a...

2015-10-23 Thread rmarsch

Github user rmarsch commented on the pull request:

https://github.com/apache/spark/pull/9228#issuecomment-150586870
  
Alright, that's a bit disappointing but I understand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11284] [ML] ALS produces float predicti...

2015-10-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9252#issuecomment-150586598
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 >

501 - 600 of 753 matches

Mail list logo