date:20150427

[GitHub] spark pull request: [Core][test][minor] replace try finally block ...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5739#issuecomment-96950628
  
  [Test build #31125 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31125/consoleFull)
 for   PR 5739 at commit 
[`55683e5`](https://github.com/apache/spark/commit/55683e57e6c532a7c4f6bbb94f5efb8c57d11670).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7176] [ml] Add validation functionality...

2015-04-27 Thread jkbradley

GitHub user jkbradley opened a pull request:

https://github.com/apache/spark/pull/5740

[SPARK-7176] [ml] Add validation functionality to Param

Main change: Added isValid field to Param.  Modified all usages to use 
isValid when relevant.  Added helper methods in ParamValidate.

Also overrode Params.validate() in:
* CrossValidator + model
* Pipeline + model

This PR is Scala + Java only.  Python will be in a follow-up PR.

CC: @mengxr

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkbradley/spark enforce-validate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5740.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5740


commit dc2061f0a51bc80dac030c925024c39b26614b91
Author: Joseph K. Bradley 
Date:   2015-04-27T18:03:53Z

merged with master.  enforcing Params.validate

commit 57b8ad12b4082bc330c82e56d86a6e8669e440d0
Author: Joseph K. Bradley 
Date:   2015-04-27T21:01:05Z

Partly done with adding checks, but blocking on adding checking 
functionality to Param

commit 8e368c712f6e4d5567b1274ba3e770af17a06a59
Author: Joseph K. Bradley 
Date:   2015-04-28T02:58:26Z

Still workin

commit d87278c211ea81f73a2a16f8e0ef282eefec0637
Author: Joseph K. Bradley 
Date:   2015-04-28T04:30:36Z

still workin

commit 26d327c08f4ac53c5e1e6719164e4bf32c413c9d
Author: Joseph K. Bradley 
Date:   2015-04-28T05:15:40Z

Maybe done

commit 39b036b2d622d4aa823db3fa24107a177ac6463a
Author: Joseph K. Bradley 
Date:   2015-04-28T05:40:16Z

many cleanups

commit f02c3c3ac1066e841615d07f2e003357b2f863f3
Author: Joseph K. Bradley 
Date:   2015-04-28T05:54:43Z

small cleanups




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

2015-04-27 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4259#discussion_r29218110
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -97,3 +190,146 @@ class LinearRegressionModel private[ml] (
 m
   }
 }
+
+private class LeastSquaresAggregator(
+weights: Vector,
+labelStd: Double,
+labelMean: Double,
+featuresStd: Array[Double],
+featuresMean: Array[Double]) extends Serializable {
+
+  private var totalCnt: Long = 0
+  private var lossSum = 0.0
+  private var diffSum = 0.0
+
+  private val (effectiveWeightsArray: Array[Double], offset: Double, dim: 
Int) = {
+val weightsArray = weights.toArray.clone()
+var sum = 0.0
+var i = 0
+while (i < weightsArray.length) {
+  if (featuresStd(i) != 0.0) {
+weightsArray(i) /=  featuresStd(i)
+sum += weightsArray(i) * featuresMean(i)
+  } else {
+weightsArray(i) = 0.0
+  }
+  i += 1
+}
+(weightsArray, -sum + labelMean / labelStd, weightsArray.length)
+  }
+  private val effectiveWeightsVector = Vectors.dense(effectiveWeightsArray)
+
+  private val gradientSumArray: Array[Double] = Array.ofDim[Double](dim)
+
+  /**
+   * Add a new training data to this LeastSquaresAggregator, and update 
the loss and gradient
+   * of the objective function.
+   *
+   * @param label The label for this data point.
+   * @param data The features for one data point in dense/sparse vector 
format to be added
+   * into this aggregator.
+   * @return This LeastSquaresAggregator object.
+   */
+  def add(label: Double, data: Vector): this.type = {
+require(dim == data.size, s"Dimensions mismatch when adding new 
sample." +
+  s" Expecting $dim but got ${data.size}.")
+
+val diff = dot(data, effectiveWeightsVector) - label / labelStd + 
offset
+
+if (diff != 0) {
+  val localGradientSumArray = gradientSumArray
+  data.foreachActive { (index, value) =>
+if (featuresStd(index) != 0.0 && value != 0.0) {
+  localGradientSumArray(index) += diff * value / featuresStd(index)
+}
+  }
+  lossSum += diff * diff / 2.0
+  diffSum += diff
+}
+
+totalCnt += 1
+this
+  }
+
+  /**
+   * Merge another LeastSquaresAggregator, and update the loss and gradient
+   * of the objective function.
+   * (Note that it's in place merging; as a result, `this` object will be 
modified.)
+   *
+   * @param other The other LeastSquaresAggregator to be merged.
+   * @return This LeastSquaresAggregator object.
+   */
+  def merge(other: LeastSquaresAggregator): this.type = {
+require(dim == other.dim, s"Dimensions mismatch when merging with 
another " +
+  s"LeastSquaresAggregator. Expecting $dim but got ${other.dim}.")
+
+if (other.totalCnt != 0) {
+  totalCnt += other.totalCnt
+  lossSum += other.lossSum
+  diffSum += other.diffSum
+
+  var i = 0
+  val localThisGradientSumArray = this.gradientSumArray
+  val localOtherGradientSumArray = other.gradientSumArray
+  while (i < dim) {
+localThisGradientSumArray(i) += localOtherGradientSumArray(i)
+i += 1
+  }
+}
+this
+  }
+
+  def count: Long = totalCnt
+
+  def loss: Double = lossSum / totalCnt
+
+  def gradient: Vector = {
+val result = Vectors.dense(gradientSumArray.clone())
+
+val correction = {
+  val temp = effectiveWeightsArray.clone()
+  var i = 0
+  while (i < temp.length) {
+temp(i) *= featuresMean(i)
+i += 1
+  }
+  Vectors.dense(temp)
+}
+
+axpy(-diffSum, correction, result)
+scal(1.0 / totalCnt, result)
--- End diff --

Okay, I finally found why `correction` effect is zero. It's because 
`diffSum` is zero in our test dataset. `diffSum` is sum of `diff`, and for a 
synthetic dataset generated from linear equation with noise, the average of 
`diff` will be zero. As a result, for a real non-linear dataset, `diffSum` will 
not be zero, so we need some non-linear dataset for testing correctness. I'll 
add famous prostate cancer dataset used in the linear regression lasso paper 
into the unit-test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-

[GitHub] spark pull request: [Core][test][minor] replace try finally block ...

2015-04-27 Thread liyezhang556520

GitHub user liyezhang556520 opened a pull request:

https://github.com/apache/spark/pull/5739

[Core][test][minor] replace try finally block with tryWithSafeFinally



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liyezhang556520/spark trySafeFinally

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5739.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5739


commit 55683e57e6c532a7c4f6bbb94f5efb8c57d11670
Author: Zhang, Liye 
Date:   2015-04-28T06:17:38Z

replace try finally block with tryWithSafeFinally




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-27 Thread selvinsource

Github user selvinsource commented on the pull request:

https://github.com/apache/spark/pull/3062#issuecomment-96950208
  
@mengxr please review, it should work as expected now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6466][SQL] Remove unnecessary attribute...

2015-04-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/5134#discussion_r29218048
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -115,6 +115,25 @@ object UnionPushdown extends Rule[LogicalPlan] {
  */
 object ColumnPruning extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+// Eliminate unneeded attributes from Expand that is used in 
GroupingSets
+case a @ Aggregate(groupByExprs, aggregations, e @ Expand(projections, 
output, child))
+if (e.outputSet -- a.references).nonEmpty =>
+
+  val substitution = projections.map { groupExpr =>
+val newExprs = groupExpr.collect {
+  case x: NamedExpression if a.references.contains(x) => x
+  case l: Literal => l
--- End diff --

Because there are some constant null values and bitmasks we need to keep 
them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-04-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4723


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Test

2015-04-27 Thread Arttii

Github user Arttii commented on the pull request:

https://github.com/apache/spark/pull/5738#issuecomment-96948086
  
Sorry made a mistake.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Test

2015-04-27 Thread Arttii

Github user Arttii closed the pull request at:

https://github.com/apache/spark/pull/5738


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Test

2015-04-27 Thread Arttii

GitHub user Arttii opened a pull request:

https://github.com/apache/spark/pull/5738

Test



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/RiverlandReply/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5738.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5738


commit d640d9c58cd4f3caa6eac462b947b3a891dabbda
Author: Yuhao Yang 
Date:   2015-02-06T03:12:49Z

online lda initial checkin

commit 043e7864555aadf49e861385bf815f5930668028
Author: Yuhao Yang 
Date:   2015-02-06T04:39:45Z

Merge remote-tracking branch 'upstream/master' into ldaonline
s
Conflicts:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala

commit 26dca1bddd98203e90e3cb36de4f3d16fbfbf6cc
Author: Yuhao Yang 
Date:   2015-02-06T05:09:06Z

style fix and make class private

commit f41c5ca2d2bb11394882d4212fd4138ae9a972a1
Author: Yuhao Yang 
Date:   2015-02-06T06:23:12Z

style fix

commit 45884ab8098ed78570f787440a7f88caa5ea2b31
Author: Yuhao Yang 
Date:   2015-02-08T03:32:25Z

Merge remote-tracking branch 'upstream/master' into ldaonline
s

commit fa408a8aedf0734cf3764a81808841d59a275ea7
Author: Yuhao Yang 
Date:   2015-02-09T01:26:44Z

ssMerge remote-tracking branch 'upstream/master' into ldaonline

commit 0d0f3eef6d4e2754bfa2904f30bf9e21005ae392
Author: Yuhao Yang 
Date:   2015-02-10T04:30:48Z

replace random split with sliding

commit 0dd39479c50fa8211938e7ea8121fdf64e8da97e
Author: Yuhao Yang 
Date:   2015-02-10T05:12:14Z

kMerge remote-tracking branch 'upstream/master' into ldaonline

commit 3a06526df629b8ff1291bfb1b183f5e6af45bcde
Author: Yuhao Yang 
Date:   2015-02-10T05:31:32Z

merge with new example

commit aa365d18e7de32781e852c5acb906c192b0af9c9
Author: Yuhao Yang 
Date:   2015-03-02T02:19:33Z

merge upstream master

commit 20328d1aa4bc1af75bc3aecab270488fe1a4e502
Author: Yuhao Yang 
Date:   2015-03-02T03:00:48Z

Merge remote-tracking branch 'upstream/master' into ldaonline
i

commit 37af91aab43827ff1db5aa13987f898578967843
Author: Yuhao Yang 
Date:   2015-03-02T11:36:28Z

iMerge remote-tracking branch 'upstream/master' into ldaonline

commit 581c623106f38d91497fb8123f47c4e661057071
Author: Yuhao Yang 
Date:   2015-03-02T11:51:44Z

seperate API and adjust batch split

commit e271eb1a0f6c329b05d3611abb3def1aeffc900e
Author: Yuhao Yang 
Date:   2015-03-02T15:16:33Z

remove non ascii

commit 4a3f27e9125bb87e5cd079904db6203693d7beb7
Author: Yuhao Yang 
Date:   2015-03-05T02:19:01Z

Merge remote-tracking branch 'upstream/master' into ldaonline

commit a570c9a5cbdbf0ac7b7a4eae1e3b571e0060e5f0
Author: Yuhao Yang 
Date:   2015-03-11T11:23:13Z

use sample to pick up batch

commit d86cdec374e0cb4e43a8e962af5f8b8cd6c70ee0
Author: Yuhao Yang 
Date:   2015-03-11T11:24:37Z

Merge remote-tracking branch 'upstream/master' into ldaonline

commit f6d47ca3cddf0eae9ba5e8bfc5a5afdd5ed0d820
Author: Yuhao Yang 
Date:   2015-03-11T11:25:57Z

Merge branch 'ldaonline' of https://github.com/hhbyyh/spark into ldaonline

commit 02d037387f32adcddd98858176813f3a66991a38
Author: Yuhao Yang 
Date:   2015-03-12T02:43:03Z

fix style in comment

commit ca61e5193bfbc2b346e53335ae5e86883a3c22a9
Author: Arttii 
Date:   2015-03-15T15:06:46Z

Merge pull request #1 from hhbyyh/ldaonline

Ldaonline




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-04-27 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/4723#issuecomment-96947833
  
Alright! Merging it. Thanks! :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4723#issuecomment-96947387
  
  [Test build #31115 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31115/consoleFull)
 for   PR 4723 at commit 
[`a1fe97c`](https://github.com/apache/spark/commit/a1fe97c496f1441e0d2a5879e257c7e569f2e541).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OffsetRange(object):`
  * `class TopicAndPartition(object):`
  * `class Broker(object):`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-04-27 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/4723#issuecomment-96947367
  
LGTM. Will merge when tests pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6846 [WEBUI] Stage kill URL easy to acci...

2015-04-27 Thread witgo

Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/5528#issuecomment-96947031
  
@srowen  This PR seems to have a bug in yarn-client:

```
HTTP ERROR 405

Problem accessing /proxy/application_1429108701044_0316/stages/stage/kill/. 
Reason:

HTTP method POST is not supported by this URL



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6612] [MLLib] [PySpark] Python KMeans p...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5647#issuecomment-96945915
  
  [Test build #31114 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31114/consoleFull)
 for   PR 5647 at commit 
[`7ecfd00`](https://github.com/apache/spark/commit/7ecfd000af37899a920cae838cc41bcc5ceca053).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5891][ML] Add Binarizer ML Transformer

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5699#issuecomment-96944251
  
  [Test build #31124 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31124/consoleFull)
 for   PR 5699 at commit 
[`cc4f03c`](https://github.com/apache/spark/commit/cc4f03c2fca6e98d540b665b9b249051306a1e24).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7181][CORE]fix inifite loop in External...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5737#issuecomment-96944305
  
  [Test build #31123 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31123/consoleFull)
 for   PR 5737 at commit 
[`2924b93`](https://github.com/apache/spark/commit/2924b93b0d16e98fb045138e2bc0f8d94b1e0bfa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7156][SQL] add randomSplit to DataFrame...

2015-04-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5711#issuecomment-96943758
  
Thanks for working on this, @kaka1992. Would be great if we can do it in a 
way that doesn't break the existing logical plan for data frames.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7156][SQL] add randomSplit to DataFrame...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5711#discussion_r29217153
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -967,6 +969,23 @@ class DataFrame private[sql](
   }
 
   /**
+   * Randomly splits this DataFrame with the provided weights.
+   *
+   * @param weights weights for splits, will be normalized if they don't 
sum to 1
+   * @param seed random seed
+   *
+   * @return split DataFrames in an array
+   */
+  def randomSplit(weights: Array[Double], seed: Long = 
Utils.random.nextLong): Array[DataFrame] = {
+val sum = weights.sum
+val normalizedCumWeights = weights.map(_ / sum).scanLeft(0.0d)(_ + _)
+normalizedCumWeights.sliding(2).map { x =>
+  this.sqlContext.createDataFrame(new PartitionwiseSampledRDD[Row, 
Row](
+rdd, new BernoulliCellSampler[Row](x(0), x(1)), true, seed), 
schema)
--- End diff --

this actually breaks the plan -- can we create a logical operator (or 
generalizes the existing Sample operator) so the returned DataFrame correctly 
preserves the logical plan?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7156][SQL] add randomSplit to DataFrame...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5711#discussion_r29217167
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -17,14 +17,13 @@
 
 package org.apache.spark.sql
 
-import scala.language.postfixOps
--- End diff --

scala inports should be first


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7181][CORE]fix inifite loop in External...

2015-04-27 Thread chouqin

GitHub user chouqin opened a pull request:

https://github.com/apache/spark/pull/5737

[SPARK-7181][CORE]fix inifite loop in Externalsorter's mergeWithAggregation

see [SPARK-7181](https://issues.apache.org/jira/browse/SPARK-7181).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chouqin/spark externalsorter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5737.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5737


commit 2924b93b0d16e98fb045138e2bc0f8d94b1e0bfa
Author: Qiping Li 
Date:   2015-04-28T06:22:54Z

fix inifite loop in Externalsorter's mergeWithAggregation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7158] [SQL] Fix bug of cached data cann...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5714#discussion_r29216999
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -71,6 +71,23 @@ class SQLQuerySuite extends QueryTest with 
BeforeAndAfterAll {
 )
   }
 
+  test("SPARK-7158 collect and take return different results") {
+import java.util.UUID
+import org.apache.spark.sql.types._
+val rdd = sparkContext.parallelize(List(1, 2, 3), 2)
--- End diff --

I think you can make this test case much simpler than it is, e.g.

```scala
val df = Seq(Tuple1(1)).toDF("col")
...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7158] [SQL] Fix bug of cached data cann...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5714#discussion_r29216965
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -991,20 +991,27 @@ class DataFrame private[sql](
*/
   override def persist(): this.type = {
 sqlContext.cacheManager.cacheQuery(this)
--- End diff --

can this function call the other persist?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7187] SerializationDebugger should not ...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5734#issuecomment-96939472
  
  [Test build #31122 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31122/consoleFull)
 for   PR 5734 at commit 
[`e8aad6c`](https://github.com/apache/spark/commit/e8aad6c5a49b032a0c066109d975ec4ccf4a0079).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Merge pull request #1 from apache/master

2015-04-27 Thread sven0726

Github user sven0726 closed the pull request at:

https://github.com/apache/spark/pull/5735


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5616#discussion_r29216750
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfuncs/unary.scala
 ---
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.mathfuncs
+
+import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, Row, UnaryExpression}
+import org.apache.spark.sql.types._
+
+/**
+ * A unary expression specifically for math functions. Math Functions 
expect a specific type of
+ * input format, therefore these functions extend `ExpectsInputTypes`.
+ * @param name The short name of the function
+ */
+abstract class MathematicalExpression(name: String)
+  extends UnaryExpression with Serializable with ExpectsInputTypes {
+  self: Product =>
+  type EvaluatedType = Any
+
+  override def dataType: DataType = DoubleType
+  override def foldable: Boolean = child.foldable
+  override def nullable: Boolean = true
+  override def toString: String = s"$name($child)"
+}
+
+/**
+ * A unary expression specifically for math functions that take a `Double` 
as input and return
+ * a `Double`.
+ * @param f The math function.
+ * @param name The short name of the function
+ */
+abstract class MathematicalExpressionForDouble(f: Double => Double, name: 
String)
+  extends MathematicalExpression(name) { self: Product =>
+  
+  override def expectedChildTypes: Seq[DataType] = Seq(DoubleType)
+
+  override def eval(input: Row): Any = {
+val evalE = child.eval(input)
+if (evalE == null) {
+  null
+} else {
+  val result = f(evalE.asInstanceOf[Double])
+  if (result.isNaN) null else result
+}
+  }
+}
+
+/**
+ * A unary expression specifically for math functions that take an `Int` 
as input and return
+ * an `Int`.
+ * @param f The math function.
+ * @param name The short name of the function
+ */
+abstract class MathematicalExpressionForInt(f: Int => Int, name: String)
+  extends MathematicalExpression(name) { self: Product =>
+
+  override def dataType: DataType = IntegerType
+  override def expectedChildTypes: Seq[DataType] = Seq(IntegerType)
+
+  override def eval(input: Row): Any = {
+val evalE = child.eval(input)
+if (evalE == null) null else f(evalE.asInstanceOf[Int])
+  }
+}
+
+/**
+ * A unary expression specifically for math functions that take a `Float` 
as input and return
+ * a `Float`.
+ * @param f The math function.
+ * @param name The short name of the function
+ */
+abstract class MathematicalExpressionForFloat(f: Float => Float, name: 
String)
+  extends MathematicalExpression(name) { self: Product =>
+
+  override def dataType: DataType = FloatType
+  override def expectedChildTypes: Seq[DataType] = Seq(FloatType)
+
+  override def eval(input: Row): Any = {
+val evalE = child.eval(input)
+if (evalE == null) {
+  null
+} else {
+  val result = f(evalE.asInstanceOf[Float])
+  if (result.isNaN) null else result
+}
+  }
+}
+
+/**
+ * A unary expression specifically for math functions that take a `Long` 
as input and return
+ * a `Long`.
+ * @param f The math function.
+ * @param name The short name of the function
+ */
+abstract class MathematicalExpressionForLong(f: Long => Long, name: String)
+  extends MathematicalExpression(name) { self: Product =>
+
+  override def dataType: DataType = LongType
+  override def expectedChildTypes: Seq[DataType] = Seq(LongType)
+
+  override def eval(input: Row): Any = {
+val evalE = child.eval(input)
+if (evalE == null) null else f(evalE.asInstanceOf[Long])
+  }
+}
+
+case class Sin(child: Expression) extends 
MathematicalExpressionForDouble(math.

[GitHub] spark pull request: [SPARK-6314][CORE] handle JsonParseException f...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5736#issuecomment-96936870
  
  [Test build #31121 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31121/consoleFull)
 for   PR 5736 at commit 
[`b8d2d88`](https://github.com/apache/spark/commit/b8d2d885f6527cbdb3377cc2e8296f612c01d596).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5616#discussion_r29216737
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/mathfunctions.scala 
---
@@ -0,0 +1,562 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import scala.language.implicitConversions
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.mathfuncs._
+import org.apache.spark.sql.functions.lit
+
+/**
+ * :: Experimental ::
+ * Mathematical Functions available for [[DataFrame]].
+ *
+ * @groupname double_funcs Functions that require DoubleType as an input
+ * @groupname int_funcs Functions that require IntegerType as an input
+ * @groupname float_funcs Functions that require FloatType as an input
+ * @groupname long_funcs Functions that require LongType as an input
+ */
+@Experimental
+// scalastyle:off
+object mathfunctions {
+// scalastyle:on
+
+  private[this] implicit def toColumn(expr: Expression): Column = 
Column(expr)
+
+  /**
+   * Computes the sine of the given value.
+   *
+   * @group double_funcs
+   */
+  def sin(e: Column): Column = Sin(e.expr)
--- End diff --

would be great to sort these functions alphabetically 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5616#discussion_r29216714
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfuncs/unary.scala
 ---
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.mathfuncs
+
+import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, Row, UnaryExpression}
+import org.apache.spark.sql.types._
+
+/**
+ * A unary expression specifically for math functions. Math Functions 
expect a specific type of
+ * input format, therefore these functions extend `ExpectsInputTypes`.
+ * @param name The short name of the function
+ */
+abstract class MathematicalExpression(name: String)
+  extends UnaryExpression with Serializable with ExpectsInputTypes {
+  self: Product =>
+  type EvaluatedType = Any
+
+  override def dataType: DataType = DoubleType
+  override def foldable: Boolean = child.foldable
+  override def nullable: Boolean = true
+  override def toString: String = s"$name($child)"
+}
+
+/**
+ * A unary expression specifically for math functions that take a `Double` 
as input and return
+ * a `Double`.
+ * @param f The math function.
+ * @param name The short name of the function
+ */
+abstract class MathematicalExpressionForDouble(f: Double => Double, name: 
String)
+  extends MathematicalExpression(name) { self: Product =>
+  
+  override def expectedChildTypes: Seq[DataType] = Seq(DoubleType)
+
+  override def eval(input: Row): Any = {
+val evalE = child.eval(input)
+if (evalE == null) {
+  null
+} else {
+  val result = f(evalE.asInstanceOf[Double])
+  if (result.isNaN) null else result
+}
+  }
+}
+
+/**
+ * A unary expression specifically for math functions that take an `Int` 
as input and return
+ * an `Int`.
+ * @param f The math function.
+ * @param name The short name of the function
+ */
+abstract class MathematicalExpressionForInt(f: Int => Int, name: String)
--- End diff --

looks like int/float/long are only used for signum. I think we can remove 
the non-double signum, and just let the auto cast you added handle it for all 
the types.

I also checked with @mengxr on this. Supporting only double as the output 
type should be enough.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread brkyvz

Github user brkyvz commented on the pull request:

https://github.com/apache/spark/pull/5616#issuecomment-96935194
  
Sure @rxin let me know what they are. I'll submit the PR for Python 
tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5616


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5616#issuecomment-96932748
  
@brkyvz  I'm going to merge this, but I have a few comments that would be 
great to address in a follow-up PR that also includes Python changes.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6314][CORE] handle JsonParseException f...

2015-04-27 Thread liyezhang556520

GitHub user liyezhang556520 opened a pull request:

https://github.com/apache/spark/pull/5736

[SPARK-6314][CORE] handle JsonParseException for history server

This is handled in the same way with 
[SPARK-6197](https://issues.apache.org/jira/browse/SPARK-6197). The result of 
this PR is that exception showed in history server log will be replaced by a 
warning, and the application that with un-complete history log file will be 
listed on history server webUI

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liyezhang556520/spark SPARK-6314

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5736.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5736


commit b8d2d885f6527cbdb3377cc2e8296f612c01d596
Author: Zhang, Liye 
Date:   2015-04-28T06:02:07Z

handle JsonParseException for history server




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Merge pull request #1 from apache/master

2015-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5735#issuecomment-96928541
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Merge pull request #1 from apache/master

2015-04-27 Thread sven0726

GitHub user sven0726 opened a pull request:

https://github.com/apache/spark/pull/5735

Merge pull request #1 from apache/master

2015-04-27ç¬¬ä¸æ¬¡merge

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sven0726/spark-1 master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5735.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5735


commit 5da8c543ff542951c3fefe6e123b891f66edf4b6
Author: sven0726 
Date:   2015-04-27T08:21:55Z

Merge pull request #1 from apache/master

2015-04-27ç¬¬ä¸æ¬¡merge




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6923] [SQL] Hive MetaStore API cannot a...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5733#issuecomment-96925980
  
  [Test build #31117 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31117/consoleFull)
 for   PR 5733 at commit 
[`1eebb46`](https://github.com/apache/spark/commit/1eebb46034f3a26eee6e5145b8d654b38b926257).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5616#issuecomment-96925746
  
  [Test build #3 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/3/consoleFull)
 for   PR 5616 at commit 
[`fb27153`](https://github.com/apache/spark/commit/fb271536a68cf3f7ff267953098ce305512c65d0).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait ExpectsInputTypes `
  * `abstract class BinaryMathExpression(f: (Double, Double) => Double, 
name: String) `
  * `case class Pow(left: Expression, right: Expression) extends 
BinaryMathExpression(math.pow, "POWER")`
  * `case class Hypot(`
  * `case class Atan2(`
  * `abstract class MathematicalExpression(name: String)`
  * `abstract class MathematicalExpressionForDouble(f: Double => Double, 
name: String)`
  * `abstract class MathematicalExpressionForInt(f: Int => Int, name: 
String)`
  * `abstract class MathematicalExpressionForFloat(f: Float => Float, name: 
String)`
  * `abstract class MathematicalExpressionForLong(f: Long => Long, name: 
String)`
  * `case class Sin(child: Expression) extends 
MathematicalExpressionForDouble(math.sin, "SIN")`
  * `case class Asin(child: Expression) extends 
MathematicalExpressionForDouble(math.asin, "ASIN")`
  * `case class Sinh(child: Expression) extends 
MathematicalExpressionForDouble(math.sinh, "SINH")`
  * `case class Cos(child: Expression) extends 
MathematicalExpressionForDouble(math.cos, "COS")`
  * `case class Acos(child: Expression) extends 
MathematicalExpressionForDouble(math.acos, "ACOS")`
  * `case class Cosh(child: Expression) extends 
MathematicalExpressionForDouble(math.cosh, "COSH")`
  * `case class Tan(child: Expression) extends 
MathematicalExpressionForDouble(math.tan, "TAN")`
  * `case class Atan(child: Expression) extends 
MathematicalExpressionForDouble(math.atan, "ATAN")`
  * `case class Tanh(child: Expression) extends 
MathematicalExpressionForDouble(math.tanh, "TANH")`
  * `case class Ceil(child: Expression) extends 
MathematicalExpressionForDouble(math.ceil, "CEIL")`
  * `case class Floor(child: Expression) extends 
MathematicalExpressionForDouble(math.floor, "FLOOR")`
  * `case class Rint(child: Expression) extends 
MathematicalExpressionForDouble(math.rint, "ROUND")`
  * `case class Cbrt(child: Expression) extends 
MathematicalExpressionForDouble(math.cbrt, "CBRT")`
  * `case class Signum(child: Expression) extends 
MathematicalExpressionForDouble(math.signum, "SIGNUM")`
  * `case class ISignum(child: Expression) extends 
MathematicalExpressionForInt(math.signum, "ISIGNUM")`
  * `case class FSignum(child: Expression) extends 
MathematicalExpressionForFloat(math.signum, "FSIGNUM")`
  * `case class LSignum(child: Expression) extends 
MathematicalExpressionForLong(math.signum, "LSIGNUM")`
  * `case class ToDegrees(child: Expression) `
  * `case class ToRadians(child: Expression) `
  * `case class Log(child: Expression) extends 
MathematicalExpressionForDouble(math.log, "LOG")`
  * `case class Log10(child: Expression) extends 
MathematicalExpressionForDouble(math.log10, "LOG10")`
  * `case class Log1p(child: Expression) extends 
MathematicalExpressionForDouble(math.log1p, "LOG1P")`
  * `case class Exp(child: Expression) extends 
MathematicalExpressionForDouble(math.exp, "EXP")`
  * `case class Expm1(child: Expression) extends 
MathematicalExpressionForDouble(math.expm1, "EXPM1")`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7187] SerializationDebugger should not ...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5734#issuecomment-96924806
  
  [Test build #31120 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31120/consoleFull)
 for   PR 5734 at commit 
[`57d0ef4`](https://github.com/apache/spark/commit/57d0ef4369a6641a7615d6cc66eeadc6e47d80c4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5616#issuecomment-96924423
  
  [Test build #31110 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31110/consoleFull)
 for   PR 5616 at commit 
[`836a098`](https://github.com/apache/spark/commit/836a098dbbfc2b671592847dff37a5cefaeaea2a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait ExpectsInputTypes `
  * `abstract class BinaryMathExpression(f: (Double, Double) => Double, 
name: String) `
  * `case class Pow(left: Expression, right: Expression) extends 
BinaryMathExpression(math.pow, "POWER")`
  * `case class Hypot(`
  * `case class Atan2(`
  * `abstract class MathematicalExpression(name: String)`
  * `abstract class MathematicalExpressionForDouble(f: Double => Double, 
name: String)`
  * `abstract class MathematicalExpressionForInt(f: Int => Int, name: 
String)`
  * `abstract class MathematicalExpressionForFloat(f: Float => Float, name: 
String)`
  * `abstract class MathematicalExpressionForLong(f: Long => Long, name: 
String)`
  * `case class Sin(child: Expression) extends 
MathematicalExpressionForDouble(math.sin, "SIN")`
  * `case class Asin(child: Expression) extends 
MathematicalExpressionForDouble(math.asin, "ASIN")`
  * `case class Sinh(child: Expression) extends 
MathematicalExpressionForDouble(math.sinh, "SINH")`
  * `case class Cos(child: Expression) extends 
MathematicalExpressionForDouble(math.cos, "COS")`
  * `case class Acos(child: Expression) extends 
MathematicalExpressionForDouble(math.acos, "ACOS")`
  * `case class Cosh(child: Expression) extends 
MathematicalExpressionForDouble(math.cosh, "COSH")`
  * `case class Tan(child: Expression) extends 
MathematicalExpressionForDouble(math.tan, "TAN")`
  * `case class Atan(child: Expression) extends 
MathematicalExpressionForDouble(math.atan, "ATAN")`
  * `case class Tanh(child: Expression) extends 
MathematicalExpressionForDouble(math.tanh, "TANH")`
  * `case class Ceil(child: Expression) extends 
MathematicalExpressionForDouble(math.ceil, "CEIL")`
  * `case class Floor(child: Expression) extends 
MathematicalExpressionForDouble(math.floor, "FLOOR")`
  * `case class Rint(child: Expression) extends 
MathematicalExpressionForDouble(math.rint, "ROUND")`
  * `case class Cbrt(child: Expression) extends 
MathematicalExpressionForDouble(math.cbrt, "CBRT")`
  * `case class Signum(child: Expression) extends 
MathematicalExpressionForDouble(math.signum, "SIGNUM")`
  * `case class ISignum(child: Expression) extends 
MathematicalExpressionForInt(math.signum, "ISIGNUM")`
  * `case class FSignum(child: Expression) extends 
MathematicalExpressionForFloat(math.signum, "FSIGNUM")`
  * `case class LSignum(child: Expression) extends 
MathematicalExpressionForLong(math.signum, "LSIGNUM")`
  * `case class ToDegrees(child: Expression) `
  * `case class ToRadians(child: Expression) `
  * `case class Log(child: Expression) extends 
MathematicalExpressionForDouble(math.log, "LOG")`
  * `case class Log10(child: Expression) extends 
MathematicalExpressionForDouble(math.log10, "LOG10")`
  * `case class Log1p(child: Expression) extends 
MathematicalExpressionForDouble(math.log1p, "LOG1P")`
  * `case class Exp(child: Expression) extends 
MathematicalExpressionForDouble(math.exp, "EXP")`
  * `case class Expm1(child: Expression) extends 
MathematicalExpressionForDouble(math.expm1, "EXPM1")`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7187] SerializationDebugger should not ...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5734#discussion_r29216045
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala ---
@@ -35,8 +35,15 @@ private[serializer] object SerializationDebugger extends 
Logging {
*/
   def improveException(obj: Any, e: NotSerializableException): 
NotSerializableException = {
 if (enableDebugging && reflect != null) {
-  new NotSerializableException(
-e.getMessage + "\nSerialization stack:\n" + find(obj).map("\t- " + 
_).mkString("\n"))
+  try {
+new NotSerializableException(
+  e.getMessage + "\nSerialization stack:\n" + find(obj).map("\t- " 
+ _).mkString("\n"))
+  } catch {
+case e2: Exception =>
--- End diff --

case NonFatal(e2)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use ...

2015-04-27 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5725#discussion_r29216032
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ---
@@ -0,0 +1,428 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions;
+
+import scala.collection.Map;
+import scala.collection.Seq;
+import scala.collection.mutable.ArraySeq;
+
+import javax.annotation.Nullable;
+import java.math.BigDecimal;
+import java.sql.Date;
+import java.util.*;
+
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.types.DataType;
+import static org.apache.spark.sql.types.DataTypes.*;
+import org.apache.spark.sql.types.StructType;
+import org.apache.spark.sql.types.UTF8String;
+import org.apache.spark.unsafe.PlatformDependent;
+import org.apache.spark.unsafe.bitset.BitSetMethods;
+
+/**
+ * An Unsafe implementation of Row which is backed by raw memory instead 
of Java objects.
+ *
+ * Each tuple has three parts: [null bit set] [values] [variable length 
portion]
+ *
+ * The bit set is used for null tracking and is aligned to 8-byte word 
boundaries.  It stores
+ * one bit per field.
+ *
+ * In the `values` region, we store one 8-byte word per field. For fields 
that hold fixed-length
+ * primitive types, such as long, double, or int, we store the value 
directly in the word. For
+ * fields with non-primitive or variable-length values, we store a 
relative offset (w.r.t. the
+ * base address of the row) that points to the beginning of the 
variable-length field.
+ *
+ * Instances of `UnsafeRow` act as pointers to row data stored in this 
format, similar to how
+ * `Writable` objects work in Hadoop.
+ */
+public final class UnsafeRow implements MutableRow {
+
+  private Object baseObject;
+  private long baseOffset;
+  /** The number of fields in this row, used for calculating the bitset 
width (and in assertions) */
+  private int numFields;
+  /** The width of the null tracking bit set, in bytes */
+  private int bitSetWidthInBytes;
+  /**
+   * This optional schema is required if you want to call generic get() 
and set() methods on
+   * this UnsafeRow, but is optional if callers will only use 
type-specific getTYPE() and setTYPE()
+   * methods.
+   */
+  @Nullable
+  private StructType schema;
--- End diff --

It's used in the generic `get()` call, which is only called internally by 
generated code that accesses UTF8String columns.  If we add an internal 
`getUTF8String()` method and update the code generator to use this, then we can 
completely remove this.  I might take a stab at this, since I think it's a 
pretty minimal change (I'll clearly mark the new method as internal so that it 
gets handled right during the planned InternalRow / Row split).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-96924189
  
  [Test build #31109 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31109/consoleFull)
 for   PR 5685 at commit 
[`9419efe`](https://github.com/apache/spark/commit/9419efea98e190766887a9ab5fbac57f45bf007c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `   *   class SomethingNotSerializable `
  * `  logDebug(s" + cloning the object $obj of class $`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7187] SerializationDebugger should not ...

2015-04-27 Thread andrewor14

GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/5734

[SPARK-7187] SerializationDebugger should not crash user code

@rxin

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark ser-deb

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5734.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5734


commit 57d0ef4369a6641a7615d6cc66eeadc6e47d80c4
Author: Andrew Or 
Date:   2015-04-28T05:41:16Z

try catch improveException




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use ...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5725#discussion_r29215924
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ---
@@ -0,0 +1,428 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions;
+
+import scala.collection.Map;
+import scala.collection.Seq;
+import scala.collection.mutable.ArraySeq;
+
+import javax.annotation.Nullable;
+import java.math.BigDecimal;
+import java.sql.Date;
+import java.util.*;
+
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.types.DataType;
+import static org.apache.spark.sql.types.DataTypes.*;
+import org.apache.spark.sql.types.StructType;
+import org.apache.spark.sql.types.UTF8String;
+import org.apache.spark.unsafe.PlatformDependent;
+import org.apache.spark.unsafe.bitset.BitSetMethods;
+
+/**
+ * An Unsafe implementation of Row which is backed by raw memory instead 
of Java objects.
+ *
+ * Each tuple has three parts: [null bit set] [values] [variable length 
portion]
+ *
+ * The bit set is used for null tracking and is aligned to 8-byte word 
boundaries.  It stores
+ * one bit per field.
+ *
+ * In the `values` region, we store one 8-byte word per field. For fields 
that hold fixed-length
+ * primitive types, such as long, double, or int, we store the value 
directly in the word. For
+ * fields with non-primitive or variable-length values, we store a 
relative offset (w.r.t. the
+ * base address of the row) that points to the beginning of the 
variable-length field.
+ *
+ * Instances of `UnsafeRow` act as pointers to row data stored in this 
format, similar to how
+ * `Writable` objects work in Hadoop.
+ */
+public final class UnsafeRow implements MutableRow {
+
+  private Object baseObject;
+  private long baseOffset;
+  /** The number of fields in this row, used for calculating the bitset 
width (and in assertions) */
+  private int numFields;
+  /** The width of the null tracking bit set, in bytes */
+  private int bitSetWidthInBytes;
+  /**
+   * This optional schema is required if you want to call generic get() 
and set() methods on
+   * this UnsafeRow, but is optional if callers will only use 
type-specific getTYPE() and setTYPE()
+   * methods.
+   */
+  @Nullable
+  private StructType schema;
--- End diff --

I think we can remove this -- since internally we never use this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use ...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5725#discussion_r29215857
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeFixedWidthAggregationMap.java
 ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions;
+
+import java.util.Arrays;
+import java.util.Iterator;
+
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+import org.apache.spark.unsafe.PlatformDependent;
+import org.apache.spark.unsafe.map.BytesToBytesMap;
+import org.apache.spark.unsafe.memory.MemoryLocation;
+import org.apache.spark.unsafe.memory.MemoryManager;
+
+/**
+ * Unsafe-based HashMap for performing aggregations where the aggregated 
values are fixed-width.
+ *
+ * This map supports a maximum of 2 billion keys.
+ */
+public final class UnsafeFixedWidthAggregationMap {
+
+  /**
+   * An empty aggregation buffer, encoded in UnsafeRow format. When 
inserting a new key into the
+   * map, we copy this buffer and use it as the value.
+   */
+  private final long[] emptyAggregationBuffer;
+
+  private final StructType aggregationBufferSchema;
+
+  private final StructType groupingKeySchema;
+
+  /**
+   * Encodes grouping keys as UnsafeRows.
+   */
+  private final UnsafeRowConverter groupingKeyToUnsafeRowConverter;
+
+  /**
+   * A hashmap which maps from opaque bytearray keys to bytearray values.
+   */
+  private final BytesToBytesMap map;
+
+  /**
+   * Re-used pointer to the current aggregation buffer
+   */
+  private final UnsafeRow currentAggregationBuffer = new UnsafeRow();
+
+  /**
+   * Scratch space that is used when encoding grouping keys into UnsafeRow 
format.
+   *
+   * By default, this is a 1MB array, but it will grow as necessary in 
case larger keys are
+   * encountered.
+   */
+  private long[] groupingKeyConversionScratchSpace = new long[1024 / 8];
+
+  private final boolean enablePerfMetrics;
+
+  /**
+   * @return true if UnsafeFixedWidthAggregationMap supports grouping keys 
with the given schema,
+   * false otherwise.
+   */
+  public static boolean supportsGroupKeySchema(StructType schema) {
+for (StructField field: schema.fields()) {
+  if (!UnsafeRow.readableFieldTypes.contains(field.dataType())) {
+return false;
+  }
+}
+return true;
+  }
+
+  /**
+   * @return true if UnsafeFixedWidthAggregationMap supports aggregation 
buffers with the given
+   * schema, false otherwise.
+   */
+  public static boolean supportsAggregationBufferSchema(StructType schema) 
{
+for (StructField field: schema.fields()) {
+  if (!UnsafeRow.settableFieldTypes.contains(field.dataType())) {
+return false;
+  }
+}
+return true;
+  }
+
+  /**
+   * Create a new UnsafeFixedWidthAggregationMap.
+   *
+   * @param emptyAggregationBuffer the default value for new keys (a 
"zero" of the agg. function)
+   * @param aggregationBufferSchema the schema of the aggregation buffer, 
used for row conversion.
+   * @param groupingKeySchema the schema of the grouping key, used for row 
conversion.
+   * @param groupingKeySchema the memory manager used to allocate our 
Unsafe memory structures.
+   * @param initialCapacity the initial capacity of the map (a sizing hint 
to avoid re-hashing).
+   * @param enablePerfMetrics if true, performance metrics will be 
recorded (has minor perf impact)
+   */
+  public UnsafeFixedWidthAggregationMap(
+  Row emptyAggregationBuffer,
+  StructType aggregationBufferSchema,
+  StructType groupingKeySchema,
+  MemoryManager memoryManager,
+  int initialCapacity,
+  boolean en

[GitHub] spark pull request: [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use ...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5725#discussion_r29215841
  
--- Diff: 
unsafe/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
@@ -0,0 +1,552 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.map;
+
+import java.lang.Override;
+import java.lang.UnsupportedOperationException;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.spark.unsafe.*;
+import org.apache.spark.unsafe.array.ByteArrayMethods;
+import org.apache.spark.unsafe.array.LongArray;
+import org.apache.spark.unsafe.bitset.BitSet;
+import org.apache.spark.unsafe.hash.Murmur3_x86_32;
+import org.apache.spark.unsafe.memory.*;
+
+/**
+ * An append-only hash map where keys and values are contiguous regions of 
bytes.
+ * 
+ * This is backed by a power-of-2-sized hash table, using quadratic 
probing with triangular numbers,
+ * which is guaranteed to exhaust the space.
+ * 
+ * Note that even though we use long for indexing, the map can support up 
to 2^31 keys because
+ * we use 32 bit MurmurHash. In either case, if the key cardinality is so 
high, you should probably
+ * be using sorting instead of hashing for better cache locality.
+ * 
+ * This class is not thread safe.
+ */
+public final class BytesToBytesMap {
+
+  private static final Murmur3_x86_32 HASHER = new Murmur3_x86_32(0);
+
+  private static final HashMapGrowthStrategy growthStrategy = 
HashMapGrowthStrategy.DOUBLING;
+
+  private final MemoryManager memoryManager;
+
+  /**
+   * A linked list for tracking all allocated data pages so that we can 
free all of our memory.
+   */
+  private final List dataPages = new 
LinkedList();
+
+  /**
+   * The data page that will be used to store keys and values for new 
hashtable entries. When this
+   * page becomes full, a new page will be allocated and this pointer will 
change to point to that
+   * new page.
+   */
+  private MemoryBlock currentDataPage = null;
+
+  /**
+   * Offset into `currentDataPage` that points to the location where new 
data can be inserted into
+   * the page.
+   */
+  private long pageCursor = 0;
+
+  /**
+   * The size of the data pages that hold key and value data. Map entries 
cannot span multiple
+   * pages, so this limits the maximum entry size.
+   */
+  private static final long PAGE_SIZE_BYTES = 1L << 26; // 64 megabytes
+
+  // This choice of page table size and page size means that we can 
address up to 500 gigabytes
+  // of memory.
+
+  /**
+   * A single array to store the key and value.
+   *
+   * Position {@code 2 * i} in the array is used to track a pointer to the 
key at index {@code i},
+   * while position {@code 2 * i + 1} in the array holds key's full 32-bit 
hashcode.
+   */
+  private LongArray longArray;
+  // TODO: we're wasting 32 bits of space here; we can probably store 
fewer bits of the hashcode
+  // and exploit word-alignment to use fewer bits to hold the address.  
This might let us store
+  // only one long per map entry, increasing the chance that this array 
will fit in cache at the
+  // expense of maybe performing more lookups if we have hash collisions.  
Say that we stored only
+  // 27 bits of the hashcode and 37 bits of the address.  37 bits is 
enough to address 1 terabyte
+  // of RAM given word-alignment.  If we use 13 bits of this for our page 
table, that gives us a
+  // maximum page size of 2^24 * 8 = ~134 megabytes per page. This change 
will require us to store
+  // full base addresses in the page table for off-heap mode so that we 
can reconstruct the full
+  // absolute memory addresses.
+
+  /**
+   * A {@link BitSet} used to track location of the map where the key is 
set.
+   * Size of the bitset should be half of the size of the long ar

[GitHub] spark pull request: [SPARK-6352] [SQL] Custom parquet output commi...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5525#issuecomment-96922530
  
  [Test build #31119 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31119/consoleFull)
 for   PR 5525 at commit 
[`54c6b15`](https://github.com/apache/spark/commit/54c6b157547ea16cc5482e9dfd396179022d5948).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6352] [SQL] Custom parquet output commi...

2015-04-27 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/5525#issuecomment-96922401
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6352] [SQL] Custom parquet output commi...

2015-04-27 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/5525#issuecomment-96922347
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use ...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5725#discussion_r29215736
  
--- Diff: 
unsafe/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
@@ -0,0 +1,552 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.map;
+
+import java.lang.Override;
+import java.lang.UnsupportedOperationException;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.spark.unsafe.*;
+import org.apache.spark.unsafe.array.ByteArrayMethods;
+import org.apache.spark.unsafe.array.LongArray;
+import org.apache.spark.unsafe.bitset.BitSet;
+import org.apache.spark.unsafe.hash.Murmur3_x86_32;
+import org.apache.spark.unsafe.memory.*;
+
+/**
+ * An append-only hash map where keys and values are contiguous regions of 
bytes.
+ * 
+ * This is backed by a power-of-2-sized hash table, using quadratic 
probing with triangular numbers,
+ * which is guaranteed to exhaust the space.
+ * 
+ * Note that even though we use long for indexing, the map can support up 
to 2^31 keys because
--- End diff --

this is no longer true - we use 32-bit int for index now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use ...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5725#discussion_r29215645
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -140,7 +140,9 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
   partial = true,
   groupingExpressions,
   partialComputation,
-  planLater(child))) :: Nil
+  planLater(child),
+  unsafeEnabled),
+  unsafeEnabled) :: Nil
--- End diff --

indent off here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6923] [SQL] Hive MetaStore API cannot a...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5733#issuecomment-96914060
  
  [Test build #31117 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31117/consoleFull)
 for   PR 5733 at commit 
[`1eebb46`](https://github.com/apache/spark/commit/1eebb46034f3a26eee6e5145b8d654b38b926257).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Build] Enable MiMa checks for SQL

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5727#issuecomment-96913846
  
  [Test build #31118 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31118/consoleFull)
 for   PR 5727 at commit 
[`0c48e4d`](https://github.com/apache/spark/commit/0c48e4d55eb2727059ccf4d439c01e9bd56a4cdd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6923] [SQL] Hive MetaStore API cannot a...

2015-04-27 Thread chenghao-intel

GitHub user chenghao-intel opened a pull request:

https://github.com/apache/spark/pull/5733

[SPARK-6923] [SQL] Hive MetaStore API cannot access Data Sourced table 
schema correctly



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenghao-intel/spark SPARK-6923

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5733.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5733


commit 1eebb46034f3a26eee6e5145b8d654b38b926257
Author: Cheng Hao 
Date:   2015-04-28T05:13:18Z

Put the FieldSchema into HiveMetastore for DataSourced table




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4449][Core] Specify port range in spark

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5722#issuecomment-96913772
  
  [Test build #31108 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31108/consoleFull)
 for   PR 5722 at commit 
[`db0d3ee`](https://github.com/apache/spark/commit/db0d3eeb2d9c0ba9ee45bffe8ef063c3d7a8aaf9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MasterWebUI(val master: Master, requestedPort: String)`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5100][SQL]add webui for thriftserver

2015-04-27 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/5730#issuecomment-96913392
  
This generally looks good, left some comments, most are about styling 
issues.

@JoshRosen Would you mind to double check the web UI changes? This should 
be generally the same as #3946. Thanks! @tianyi will add Selenium based test 
cases later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5709#issuecomment-96913243
  
  [Test build #31116 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31116/consoleFull)
 for   PR 5709 at commit 
[`7853611`](https://github.com/apache/spark/commit/78536117934223e2d0f7554897f319ef4b680650).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5945] Spark should not retry a stage in...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5636#issuecomment-96911624
  
  [Test build #31107 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31107/consoleFull)
 for   PR 5636 at commit 
[`914b2cb`](https://github.com/apache/spark/commit/914b2cb4626b59bb4ce79d9ce303ffaec0a2d159).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5100][SQL]add webui for thriftserver

2015-04-27 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/5730#discussion_r29214768
  
--- Diff: 
sql/hive-thriftserver/v0.13.1/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim13.scala
 ---
@@ -227,11 +236,14 @@ private[hive] class 
SparkSQLSessionManager(hiveContext: HiveContext)
   withImpersonation: Boolean,
   delegationToken: String): SessionHandle = {
 hiveContext.openSession()
-
-super.openSession(protocol, username, passwd, sessionConf, 
withImpersonation, delegationToken)
+val sessionHandle = super.openSession(protocol, username, passwd, 
sessionConf, withImpersonation, delegationToken)
--- End diff --

100 columns exceeded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5100][SQL]add webui for thriftserver

2015-04-27 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/5730#discussion_r29214771
  
--- Diff: 
sql/hive-thriftserver/v0.13.1/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim13.scala
 ---
@@ -227,11 +236,14 @@ private[hive] class 
SparkSQLSessionManager(hiveContext: HiveContext)
   withImpersonation: Boolean,
   delegationToken: String): SessionHandle = {
 hiveContext.openSession()
-
-super.openSession(protocol, username, passwd, sessionConf, 
withImpersonation, delegationToken)
+val sessionHandle = super.openSession(protocol, username, passwd, 
sessionConf, withImpersonation, delegationToken)
+val session = super.getSession(sessionHandle)
+HiveThriftServer2.listener.onSessionCreated(session.getIpAddress, 
sessionHandle.getSessionId.toString, session.getUsername)
--- End diff --

100 columns exceeded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5100][SQL]add webui for thriftserver

2015-04-27 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/5730#discussion_r29214742
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala
 ---
@@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver.ui
+
+import java.util.Calendar
+import javax.servlet.http.HttpServletRequest
+
+import org.apache.commons.lang3.StringEscapeUtils
+import org.apache.spark.Logging
+import 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.{ExecutionInfo, 
ExecutionState}
+import org.apache.spark.ui.UIUtils._
+import org.apache.spark.ui._
+
+import scala.xml.Node
--- End diff --

Import order is wrong here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6612] [MLLib] [PySpark] Python KMeans p...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5647#issuecomment-96910373
  
  [Test build #31114 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31114/consoleFull)
 for   PR 5647 at commit 
[`7ecfd00`](https://github.com/apache/spark/commit/7ecfd000af37899a920cae838cc41bcc5ceca053).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use ...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5725#issuecomment-96910423
  
  [Test build #31113 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31113/consoleFull)
 for   PR 5725 at commit 
[`162caf7`](https://github.com/apache/spark/commit/162caf74c15952f6ae0482c1fa74529a0289b039).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4723#issuecomment-96910058
  
  [Test build #31115 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31115/consoleFull)
 for   PR 4723 at commit 
[`a1fe97c`](https://github.com/apache/spark/commit/a1fe97c496f1441e0d2a5879e257c7e569f2e541).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5100][SQL]add webui for thriftserver

2015-04-27 Thread tianyi

Github user tianyi commented on the pull request:

https://github.com/apache/spark/pull/5730#issuecomment-96908252
  
@WangTaoTheTonic , the Thrift Server's driver UI you mentioned is a 
universal UI for all kinds of work on spark, and mainly focus on jobs, stages 
and tasks. 
In this PR, we provide a new tab that focus on users, sessions and SQL.
So the driver UI won't be replaced since they are entirely different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7139][Streaming] Allow received block m...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5732#issuecomment-96907713
  
  [Test build #31106 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31106/consoleFull)
 for   PR 5732 at commit 
[`d06fa21`](https://github.com/apache/spark/commit/d06fa21f80eeda04c11e454a461902717a2e5c7d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5709#issuecomment-96907479
  
  [Test build #31112 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31112/consoleFull)
 for   PR 5709 at commit 
[`a9fda0d`](https://github.com/apache/spark/commit/a9fda0d5c65b884a4e115bba2401bd89ce4436f6).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class MonotonicallyIncreasingID() extends LeafExpression `

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Doc][Minor]Remove unused libs from docs direc...

2015-04-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5721#issuecomment-96907412
  
I agree with @srowen. I think it's best to leave these alone unless they 
are causing problems. Do yo mind closing this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7174][Core] Move calling `TaskScheduler...

2015-04-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5723


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7174][Core] Move calling `TaskScheduler...

2015-04-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5723#issuecomment-96907193
  
I've merged this. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-27 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/5709#issuecomment-96906797
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5616#discussion_r29213825
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfuncs/binary.scala
 ---
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.mathfuncs
+
+import org.apache.spark.sql.catalyst.analysis.UnresolvedException
+import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
BinaryExpression, Expression, Row}
+import org.apache.spark.sql.types._
+
+/**
+ * A binary expression specifically for math functions that take two 
`Double`s as input and returns
+ * a `Double`.
+ * @param f The math function.
+ * @param name The short name of the function
+ */
+abstract class BinaryMathExpression(f: (Double, Double) => Double, name: 
String) 
+  extends BinaryExpression with Serializable with ExpectsInputTypes { 
self: Product =>
+  type EvaluatedType = Any
+  override def symbol: String = null
+  override def expectedChildTypes: Seq[DataType] = Seq(DoubleType, 
DoubleType)
+
+  override def nullable: Boolean = left.nullable || right.nullable
+  override def toString: String = s"$name($left, $right)"
+
+  override lazy val resolved =
+left.resolved && right.resolved &&
+  left.dataType == right.dataType &&
+  !DecimalType.isFixed(left.dataType)
+
+  override def dataType: DataType = {
+if (!resolved) {
+  throw new UnresolvedException(this,
+s"datatype. Can not resolve due to differing types 
${left.dataType}, ${right.dataType}")
--- End diff --

i think this technically becomes

Invalid call to datatype. Can not resolve due to differing types 
${left.dataType}, ${right.dataType}  on unresolved object


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5709#issuecomment-96906610
  
  [Test build #31112 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31112/consoleFull)
 for   PR 5709 at commit 
[`a9fda0d`](https://github.com/apache/spark/commit/a9fda0d5c65b884a4e115bba2401bd89ce4436f6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7165] [SQL] use sort merge join for out...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5717#issuecomment-96906310
  
  [Test build #31103 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31103/consoleFull)
 for   PR 5717 at commit 
[`ae68ee7`](https://github.com/apache/spark/commit/ae68ee751d081049700063a0add34cac75175122).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5709#issuecomment-96906283
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5731#issuecomment-96906037
  
  [Test build #31104 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31104/consoleFull)
 for   PR 5731 at commit 
[`fd154d7`](https://github.com/apache/spark/commit/fd154d78188d5de87e12fec8e4fb454055378385).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `final class VectorSlicer extends Transformer with HasInputCol with 
HasOutputCol `

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-27 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/5645#discussion_r29213656
  
--- Diff: 
streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLogSegment.java
 ---
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.util;
+
+/**
+ * This is an interface that represent the information required by any 
implementation of
+ * a WriteAheadLog to read a written record.
+ */
+@org.apache.spark.annotation.DeveloperApi
+public interface WriteAheadLogSegment extends java.io.Serializable {
--- End diff --

But in the current approach, they can't for instance use kryo or protobuf 
to serialize, unless they do something really crazy like use an externalizable 
hook to then call into Kryo. I guess I'm just thinking ahead to how this will 
evolve. However, if we want to have this in the future we can always create an 
alternative version that is additive, so I don't feel strongly at all


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5100][SQL]add webui for thriftserver

2015-04-27 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/5730#discussion_r29213643
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver.ui
+
+import java.util.Calendar
+import javax.servlet.http.HttpServletRequest
+
+import org.apache.commons.lang3.StringEscapeUtils
+import org.apache.spark.Logging
+import 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.{SessionInfo, 
ExecutionState, ExecutionInfo}
+import org.apache.spark.ui.UIUtils._
+import org.apache.spark.ui._
+
+import scala.xml.Node
+
+/** Page for Spark Web UI that shows statistics of a streaming job */
+private[ui] class ThriftServerPage(parent: ThriftServerTab) extends 
WebUIPage("") with Logging {
+
+  private val listener = parent.listener
+  private val startTime = Calendar.getInstance().getTime()
+  private val emptyCell = "-"
+
+  /** Render the page */
+  def render(request: HttpServletRequest): Seq[Node] = {
+val content =
+  generateBasicStats() ++
+   ++
+  
+Total {listener.sessionList.size} session online,
+Total {listener.totalRunning} sql running
--- End diff --

```scala
{listener.sessionList.size} session(s) are online,
running {listener.totalRunning) SQL statement(s)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5100][SQL]add webui for thriftserver

2015-04-27 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/5730#discussion_r29213610
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver.ui
+
+import java.util.Calendar
+import javax.servlet.http.HttpServletRequest
+
+import org.apache.commons.lang3.StringEscapeUtils
+import org.apache.spark.Logging
+import 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.{SessionInfo, 
ExecutionState, ExecutionInfo}
+import org.apache.spark.ui.UIUtils._
+import org.apache.spark.ui._
+
+import scala.xml.Node
+
+/** Page for Spark Web UI that shows statistics of a streaming job */
+private[ui] class ThriftServerPage(parent: ThriftServerTab) extends 
WebUIPage("") with Logging {
+
+  private val listener = parent.listener
+  private val startTime = Calendar.getInstance().getTime()
+  private val emptyCell = "-"
+
+  /** Render the page */
+  def render(request: HttpServletRequest): Seq[Node] = {
+val content =
+  generateBasicStats() ++
+   ++
+  
+Total {listener.sessionList.size} session online,
--- End diff --

"session online" => "session(s) online"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-27 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-96905575
  
I found the issue. Turns out `SerializationDebugger` currently cannot 
serialize `ClosureCleanerSuite2` due to 
[SPARK-7180](https://issues.apache.org/jira/browse/SPARK-7180). In a nutshell, 
it turns out that the debugger currently doesn't handle the case where an 
object both (1) inherits a serializable parent, and (2) has a non-serializable 
field. The reason why it passed for me locally before was because the debugger 
was somehow not 
[enabled](https://github.com/apache/spark/blob/4d9e560b5470029143926827b1cb9d72a0bfbeff/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L62),
 and I suspect this has something to do with the fact that I'm running Java 6. 
I was able to reproduce the exception locally by always enabling the debugger.

Anyway, I have disabled the debugger in the test for now and hopefully it 
will pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7174][Core] Move calling `TaskScheduler...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5723#discussion_r29213486
  
--- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala ---
@@ -76,13 +76,15 @@ private[spark] class HeartbeatReceiver(sc: SparkContext)
   
   private var timeoutCheckingTask: ScheduledFuture[_] = null
 
-  private val timeoutCheckingThread =
-
ThreadUtils.newDaemonSingleThreadScheduledExecutor("heartbeat-timeout-checking-thread")
+  // "eventLoopThread" is used to run some pretty fast actions. The 
actions running in it should not
+  // block the thread for a long time.
+  private val eventLoopThread =
+
ThreadUtils.newDaemonSingleThreadScheduledExecutor("heartbeat-receiver-event-loop-thread")
 
   private val killExecutorThread = 
ThreadUtils.newDaemonSingleThreadExecutor("kill-executor-thread")
 
   override def onStart(): Unit = {
-timeoutCheckingTask = timeoutCheckingThread.scheduleAtFixedRate(new 
Runnable {
+timeoutCheckingTask = eventLoopThread.scheduleAtFixedRate(new Runnable 
{
   override def run(): Unit = Utils.tryLogNonFatalError {
 Option(self).foreach(_.send(ExpireDeadHosts))
--- End diff --

ah ok that makes sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5616#issuecomment-96905467
  
  [Test build #3 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/3/consoleFull)
 for   PR 5616 at commit 
[`fb27153`](https://github.com/apache/spark/commit/fb271536a68cf3f7ff267953098ce305512c65d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7174][Core] Move calling `TaskScheduler...

2015-04-27 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/5723#discussion_r29213445
  
--- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala ---
@@ -76,13 +76,15 @@ private[spark] class HeartbeatReceiver(sc: SparkContext)
   
   private var timeoutCheckingTask: ScheduledFuture[_] = null
 
-  private val timeoutCheckingThread =
-
ThreadUtils.newDaemonSingleThreadScheduledExecutor("heartbeat-timeout-checking-thread")
+  // "eventLoopThread" is used to run some pretty fast actions. The 
actions running in it should not
+  // block the thread for a long time.
+  private val eventLoopThread =
+
ThreadUtils.newDaemonSingleThreadScheduledExecutor("heartbeat-receiver-event-loop-thread")
 
   private val killExecutorThread = 
ThreadUtils.newDaemonSingleThreadExecutor("kill-executor-thread")
 
   override def onStart(): Unit = {
-timeoutCheckingTask = timeoutCheckingThread.scheduleAtFixedRate(new 
Runnable {
+timeoutCheckingTask = eventLoopThread.scheduleAtFixedRate(new Runnable 
{
   override def run(): Unit = Utils.tryLogNonFatalError {
 Option(self).foreach(_.send(ExpireDeadHosts))
--- End diff --

I wrote this line because I was trying to write the following 3 lines code:

```Scala
var _self = self
if (_self != null) {
  _self.send(ExpireDeadHosts)
}
```
`self` is a method actually. In your code, the second `self` may return 
`null` if it's stopping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-27 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/5645#discussion_r29213422
  
--- Diff: 
streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLog.java ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.util;
+
+import java.nio.ByteBuffer;
+import java.util.Iterator;
+
+/**
+ * Interface representing a write ahead log (aka journal) that is used by 
Spark Streaming to
+ * save the received data (by receivers) and associated metadata to a 
reliable storage, so that
+ * they can be recovered after driver failures. See the Spark docs for 
more information on how
+ * to plug in your own custom implementation of a write ahead log.
+ */
+@org.apache.spark.annotation.DeveloperApi
+public interface WriteAheadLog {
--- End diff --

Yes. Its meant for users to create arbitrary implementations and we want to
stay backward compatible (scala traits  have pretty nasty corner cases).
On Apr 26, 2015 9:48 PM, "Hari Shreedharan" 
wrote:

> In
> streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLog.java
> :
>
> > + * limitations under the License.
> > + */
> > +
> > +package org.apache.spark.streaming.util;
> > +
> > +import java.nio.ByteBuffer;
> > +import java.util.Iterator;
> > +
> > +/**
> > + * Interface representing a write ahead log (aka journal) that is used 
by Spark Streaming to
> > + * save the received data (by receivers) and associated metadata to a 
reliable storage, so that
> > + * they can be recovered after driver failures. See the Spark docs for 
more information on how
> > + * to plug in your own custom implementation of a write ahead log.
> > + */
> > +@org.apache.spark.annotation.DeveloperApi
> > +public interface WriteAheadLog {
>
> Is the idea that this would be useful for Java implementations to keep
> this a Java interface?
>
> â
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/5616#discussion_r29213390
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 ---
@@ -643,4 +644,26 @@ trait HiveTypeCoercion {
 }
   }
 
+  /**
+   * Casts types according to the expected input types for Expressions 
that have the trait
+   * `ExpectsInputTypes`.
+   */
+  object ExpectedInputConversion extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan 
transformAllExpressions {
+  // Skip nodes who's children have not been resolved yet.
+  case e if !e.childrenResolved => e
+
+  case e: ExpectsInputTypes if e.children.map(_.dataType) != 
e.expectedChildTypes =>
+val newC = (e.children, e.children.map(_.dataType), 
e.expectedChildTypes).zipped.map {
+  case (child, actual, expected) =>
+if (actual == expected) {
--- End diff --

Java tests failed, because I removed the test data out of `TestData`. Got 
this in as well :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/5616#discussion_r29213354
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -83,8 +83,8 @@ abstract class BinaryArithmetic extends BinaryExpression {
 
   def dataType: DataType = {
 if (!resolved) {
-  throw new UnresolvedException(this,
-s"datatype. Can not resolve due to differing types 
${left.dataType}, ${right.dataType}")
+  throw new UnresolvedException(this, "Unresolved datatype. Can not 
resolve due to " +
--- End diff --

whoa... reverted the messages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5100][SQL]add webui for thriftserver

2015-04-27 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/5730#discussion_r29213358
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver.ui
+
+import java.util.Calendar
+import javax.servlet.http.HttpServletRequest
+
+import org.apache.commons.lang3.StringEscapeUtils
+import org.apache.spark.Logging
+import 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.{SessionInfo, 
ExecutionState, ExecutionInfo}
+import org.apache.spark.ui.UIUtils._
+import org.apache.spark.ui._
+
+import scala.xml.Node
--- End diff --

Import order should be `java`, `javax`, `scala`, and then others.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5616#issuecomment-96905165
  
  [Test build #31110 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31110/consoleFull)
 for   PR 5616 at commit 
[`836a098`](https://github.com/apache/spark/commit/836a098dbbfc2b671592847dff37a5cefaeaea2a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-96905160
  
  [Test build #31109 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31109/consoleFull)
 for   PR 5685 at commit 
[`9419efe`](https://github.com/apache/spark/commit/9419efea98e190766887a9ab5fbac57f45bf007c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7158] [SQL] Fix bug of cached data cann...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5714#issuecomment-96904977
  
  [Test build #31099 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31099/consoleFull)
 for   PR 5714 at commit 
[`c0dc28d`](https://github.com/apache/spark/commit/c0dc28de2c43be79b2e026082a1a82ee532fb2eb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7067][SQL] fix bug when use complex nes...

2015-04-27 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/5659#issuecomment-96904931
  
ping @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5616#discussion_r29213040
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -83,8 +83,8 @@ abstract class BinaryArithmetic extends BinaryExpression {
 
   def dataType: DataType = {
 if (!resolved) {
-  throw new UnresolvedException(this,
-s"datatype. Can not resolve due to differing types 
${left.dataType}, ${right.dataType}")
+  throw new UnresolvedException(this, "Unresolved datatype. Can not 
resolve due to " +
--- End diff --

I just realized UnresolvedException has a pretty weird definition.


https://github.com/apache/spark/blob/4754e16f4746ebd882b2ce7f1efc6e4d4408922c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala#L31

...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5616#discussion_r29212988
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 ---
@@ -643,4 +644,26 @@ trait HiveTypeCoercion {
 }
   }
 
+  /**
+   * Casts types according to the expected input types for Expressions 
that have the trait
+   * `ExpectsInputTypes`.
+   */
+  object ExpectedInputConversion extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan 
transformAllExpressions {
+  // Skip nodes who's children have not been resolved yet.
+  case e if !e.childrenResolved => e
+
+  case e: ExpectsInputTypes if e.children.map(_.dataType) != 
e.expectedChildTypes =>
+val newC = (e.children, e.children.map(_.dataType), 
e.expectedChildTypes).zipped.map {
+  case (child, actual, expected) =>
+if (actual == expected) {
--- End diff --

if u need to update the PR again, maybe collapse this into a single line, 
i.e.
```scala
if (actual == expected) child else Cast(child, expected)
```

if you don't need to update it then don't do it just for this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5100][SQL]add webui for thriftserver

2015-04-27 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/5730#discussion_r29212930
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala
 ---
@@ -73,15 +94,146 @@ object HiveThriftServer2 extends Logging {
 }
   }
 
+  private[thriftserver] class SessionInfo(
+  val sessionId: String,
+  val startTimestamp: Long,
+  val ip: String,
+  val userName: String) {
+var finishTimestamp: Long = 0L
+var totalExecute: Int = 0
+def totalTime: Long = {
+  if (finishTimestamp == 0L) {
+System.currentTimeMillis() - startTimestamp
+  } else {
+finishTimestamp - startTimestamp
+  }
+}
+  }
+
+  private[thriftserver] object ExecutionState extends Enumeration {
+val STARTED, COMPILED, FAILED, FINISHED = Value
+type ExecutionState = Value
+  }
+
+  private[thriftserver] class ExecutionInfo(
+  val statement: String,
+  val sessionId: String,
+  val startTimestamp: Long,
+  val userName: String) {
+var finishTimestamp: Long = 0L
+var executePlan: String = ""
+var detail: String = ""
+var state: ExecutionState.Value = ExecutionState.STARTED
+val jobId: ArrayBuffer[String] = ArrayBuffer[String]()
+var groupId: String = ""
+def totalTime: Long = {
+  if (finishTimestamp == 0L) {
+System.currentTimeMillis - startTimestamp
+  } else {
+finishTimestamp - startTimestamp
+  }
+}
+  }
+
+
   /**
* A inner sparkListener called in sc.stop to clean up the 
HiveThriftServer2
*/
-  class HiveThriftServer2Listener(val server: HiveServer2) extends 
SparkListener {
+  class HiveThriftServer2Listener(
+  val server: HiveServer2,
+  val conf: SparkConf) extends SparkListener {
+
 override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
   server.stop()
 }
-  }
 
+val sessionList = new mutable.HashMap[String, SessionInfo]
+val executeList = new mutable.HashMap[String, ExecutionInfo]
+val retainedStatements =
+  conf.getInt("spark.thriftserver.ui.retainedStatements", 200)
+val retainedSessions =
+  conf.getInt("spark.thriftserver.ui.retainedSessions", 200)
+var totalRunning = 0
+
+override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
+  val jobGroup = for (
+props <- Option(jobStart.properties);
+statement <- 
Option(props.getProperty(SparkContext.SPARK_JOB_GROUP_ID))
+  ) yield statement
+
+  jobGroup.map( groupId => {
+val ret = executeList.find( _ match {
+  case (id: String, info: ExecutionInfo) => info.groupId == groupId
+})
+if (ret.isDefined) {
+  ret.get._2.jobId += jobStart.jobId.toString
+  ret.get._2.groupId = groupId
+}
+  })
+}
+
+def onSessionCreated(ip: String, sessionId: String, userName: String = 
"UNKNOWN"): Unit = {
+  val info = new SessionInfo(sessionId, System.currentTimeMillis, ip, 
userName)
+  sessionList(sessionId) = info
+  trimSessionIfNecessary
+}
+
+def onSessionClosed(sessionId: String): Unit = {
+  sessionList(sessionId).finishTimestamp = System.currentTimeMillis
+}
+
+def onStatementStart(
+id: String,
+sessionId: String,
+statement: String,
+groupId: String,
+userName: String = "UNKNOWN"): Unit = {
+  val info = new ExecutionInfo(statement, sessionId, 
System.currentTimeMillis, userName)
+  info.state = ExecutionState.STARTED
+  executeList(id) = info
+  trimExecutionIfNecessary
+  sessionList(sessionId).totalExecute += 1
+  executeList(id).groupId = groupId
+  totalRunning += 1
+}
+
+def onStatementParse(id: String, executePlan: String): Unit = {
--- End diff --

Maybe `onStatementCompiled`? Also, `executePlan` should be `executionPlan`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 5659 Flaky test: o.a.s.streaming.Receive...

2015-04-27 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/4957#issuecomment-96902804
  
This is ok to test.

On Mon, Apr 27, 2015 at 11:21 AM, UCB AMPLab 
wrote:

> Can one of the admins verify this patch?
>
> â
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5100][SQL]add webui for thriftserver

2015-04-27 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/5730#issuecomment-96902704
  
Sorry for not follow the previous comments but could you tell the 
difference between this tab and Thrift Server's driver UI?
Once this is added, will Thrift Server's Driver UI still be reserved?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5616#issuecomment-96902685
  
  [Test build #31105 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31105/consoleFull)
 for   PR 5616 at commit 
[`e5f0d13`](https://github.com/apache/spark/commit/e5f0d139dba063cc07f8cee92156cdbb719978bb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait ExpectsInputTypes `
  * `abstract class BinaryMathExpression(f: (Double, Double) => Double, 
name: String) `
  * `case class Pow(left: Expression, right: Expression) extends 
BinaryMathExpression(math.pow, "POWER")`
  * `case class Hypot(`
  * `case class Atan2(`
  * `abstract class MathematicalExpression(name: String)`
  * `abstract class MathematicalExpressionForDouble(f: Double => Double, 
name: String)`
  * `abstract class MathematicalExpressionForInt(f: Int => Int, name: 
String)`
  * `abstract class MathematicalExpressionForFloat(f: Float => Float, name: 
String)`
  * `abstract class MathematicalExpressionForLong(f: Long => Long, name: 
String)`
  * `case class Sin(child: Expression) extends 
MathematicalExpressionForDouble(math.sin, "SIN")`
  * `case class Asin(child: Expression) extends 
MathematicalExpressionForDouble(math.asin, "ASIN")`
  * `case class Sinh(child: Expression) extends 
MathematicalExpressionForDouble(math.sinh, "SINH")`
  * `case class Cos(child: Expression) extends 
MathematicalExpressionForDouble(math.cos, "COS")`
  * `case class Acos(child: Expression) extends 
MathematicalExpressionForDouble(math.acos, "ACOS")`
  * `case class Cosh(child: Expression) extends 
MathematicalExpressionForDouble(math.cosh, "COSH")`
  * `case class Tan(child: Expression) extends 
MathematicalExpressionForDouble(math.tan, "TAN")`
  * `case class Atan(child: Expression) extends 
MathematicalExpressionForDouble(math.atan, "ATAN")`
  * `case class Tanh(child: Expression) extends 
MathematicalExpressionForDouble(math.tanh, "TANH")`
  * `case class Ceil(child: Expression) extends 
MathematicalExpressionForDouble(math.ceil, "CEIL")`
  * `case class Floor(child: Expression) extends 
MathematicalExpressionForDouble(math.floor, "FLOOR")`
  * `case class Rint(child: Expression) extends 
MathematicalExpressionForDouble(math.rint, "ROUND")`
  * `case class Cbrt(child: Expression) extends 
MathematicalExpressionForDouble(math.cbrt, "CBRT")`
  * `case class Signum(child: Expression) extends 
MathematicalExpressionForDouble(math.signum, "SIGNUM")`
  * `case class ISignum(child: Expression) extends 
MathematicalExpressionForInt(math.signum, "ISIGNUM")`
  * `case class FSignum(child: Expression) extends 
MathematicalExpressionForFloat(math.signum, "FSIGNUM")`
  * `case class LSignum(child: Expression) extends 
MathematicalExpressionForLong(math.signum, "LSIGNUM")`
  * `case class ToDegrees(child: Expression) `
  * `case class ToRadians(child: Expression) `
  * `case class Log(child: Expression) extends 
MathematicalExpressionForDouble(math.log, "LOG")`
  * `case class Log10(child: Expression) extends 
MathematicalExpressionForDouble(math.log10, "LOG10")`
  * `case class Log1p(child: Expression) extends 
MathematicalExpressionForDouble(math.log1p, "LOG1P")`
  * `case class Exp(child: Expression) extends 
MathematicalExpressionForDouble(math.exp, "EXP")`
  * `case class Expm1(child: Expression) extends 
MathematicalExpressionForDouble(math.expm1, "EXPM1")`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6612] [MLLib] [PySpark] Python KMeans p...

2015-04-27 Thread FlytxtRnD

Github user FlytxtRnD commented on the pull request:

https://github.com/apache/spark/pull/5647#issuecomment-96901873
  
@jkbradley, we are facing some issues with python 3 support. We are working 
on it and will fix it asap.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5100][SQL]add webui for thriftserver

2015-04-27 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/5730#discussion_r29212594
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala
 ---
@@ -73,15 +94,146 @@ object HiveThriftServer2 extends Logging {
 }
   }
 
+  private[thriftserver] class SessionInfo(
+  val sessionId: String,
+  val startTimestamp: Long,
+  val ip: String,
+  val userName: String) {
+var finishTimestamp: Long = 0L
+var totalExecute: Int = 0
+def totalTime: Long = {
+  if (finishTimestamp == 0L) {
+System.currentTimeMillis() - startTimestamp
+  } else {
+finishTimestamp - startTimestamp
+  }
+}
+  }
+
+  private[thriftserver] object ExecutionState extends Enumeration {
+val STARTED, COMPILED, FAILED, FINISHED = Value
+type ExecutionState = Value
+  }
+
+  private[thriftserver] class ExecutionInfo(
+  val statement: String,
+  val sessionId: String,
+  val startTimestamp: Long,
+  val userName: String) {
+var finishTimestamp: Long = 0L
+var executePlan: String = ""
+var detail: String = ""
+var state: ExecutionState.Value = ExecutionState.STARTED
+val jobId: ArrayBuffer[String] = ArrayBuffer[String]()
+var groupId: String = ""
+def totalTime: Long = {
+  if (finishTimestamp == 0L) {
+System.currentTimeMillis - startTimestamp
+  } else {
+finishTimestamp - startTimestamp
+  }
+}
+  }
+
+
   /**
* A inner sparkListener called in sc.stop to clean up the 
HiveThriftServer2
*/
-  class HiveThriftServer2Listener(val server: HiveServer2) extends 
SparkListener {
+  class HiveThriftServer2Listener(
+  val server: HiveServer2,
+  val conf: SparkConf) extends SparkListener {
+
 override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
   server.stop()
 }
-  }
 
+val sessionList = new mutable.HashMap[String, SessionInfo]
+val executeList = new mutable.HashMap[String, ExecutionInfo]
+val retainedStatements =
+  conf.getInt("spark.thriftserver.ui.retainedStatements", 200)
+val retainedSessions =
+  conf.getInt("spark.thriftserver.ui.retainedSessions", 200)
+var totalRunning = 0
+
+override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
+  val jobGroup = for (
+props <- Option(jobStart.properties);
+statement <- 
Option(props.getProperty(SparkContext.SPARK_JOB_GROUP_ID))
+  ) yield statement
+
+  jobGroup.map( groupId => {
+val ret = executeList.find( _ match {
+  case (id: String, info: ExecutionInfo) => info.groupId == groupId
+})
+if (ret.isDefined) {
+  ret.get._2.jobId += jobStart.jobId.toString
+  ret.get._2.groupId = groupId
+}
+  })
+}
+
+def onSessionCreated(ip: String, sessionId: String, userName: String = 
"UNKNOWN"): Unit = {
+  val info = new SessionInfo(sessionId, System.currentTimeMillis, ip, 
userName)
+  sessionList(sessionId) = info
+  trimSessionIfNecessary
+}
+
+def onSessionClosed(sessionId: String): Unit = {
+  sessionList(sessionId).finishTimestamp = System.currentTimeMillis
+}
+
+def onStatementStart(
+id: String,
+sessionId: String,
+statement: String,
+groupId: String,
+userName: String = "UNKNOWN"): Unit = {
+  val info = new ExecutionInfo(statement, sessionId, 
System.currentTimeMillis, userName)
+  info.state = ExecutionState.STARTED
+  executeList(id) = info
+  trimExecutionIfNecessary
+  sessionList(sessionId).totalExecute += 1
+  executeList(id).groupId = groupId
+  totalRunning += 1
+}
+
+def onStatementParse(id: String, executePlan: String): Unit = {
+  executeList(id).executePlan = executePlan
+  executeList(id).state = ExecutionState.COMPILED
+}
+
+def onStatementError(id: String, errorMessage: String, errorTrace: 
String): Unit = {
+  executeList(id).finishTimestamp = System.currentTimeMillis
+  executeList(id).detail = errorMessage
+  executeList(id).state = ExecutionState.FAILED
+  totalRunning -= 1
+}
+
+def onStatementFinish(id: String): Unit = {
+  executeList(id).finishTimestamp = System.currentTimeMillis
+  executeList(id).state = ExecutionState.FINISHED
+  totalRunning -= 1
+}
+
+private d

[GitHub] spark pull request: [SPARK-5100][SQL]add webui for thriftserver

2015-04-27 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/5730#discussion_r29212588
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala
 ---
@@ -73,15 +94,146 @@ object HiveThriftServer2 extends Logging {
 }
   }
 
+  private[thriftserver] class SessionInfo(
+  val sessionId: String,
+  val startTimestamp: Long,
+  val ip: String,
+  val userName: String) {
+var finishTimestamp: Long = 0L
+var totalExecute: Int = 0
+def totalTime: Long = {
+  if (finishTimestamp == 0L) {
+System.currentTimeMillis() - startTimestamp
+  } else {
+finishTimestamp - startTimestamp
+  }
+}
+  }
+
+  private[thriftserver] object ExecutionState extends Enumeration {
+val STARTED, COMPILED, FAILED, FINISHED = Value
+type ExecutionState = Value
+  }
+
+  private[thriftserver] class ExecutionInfo(
+  val statement: String,
+  val sessionId: String,
+  val startTimestamp: Long,
+  val userName: String) {
+var finishTimestamp: Long = 0L
+var executePlan: String = ""
+var detail: String = ""
+var state: ExecutionState.Value = ExecutionState.STARTED
+val jobId: ArrayBuffer[String] = ArrayBuffer[String]()
+var groupId: String = ""
+def totalTime: Long = {
+  if (finishTimestamp == 0L) {
+System.currentTimeMillis - startTimestamp
+  } else {
+finishTimestamp - startTimestamp
+  }
+}
+  }
+
+
   /**
* A inner sparkListener called in sc.stop to clean up the 
HiveThriftServer2
*/
-  class HiveThriftServer2Listener(val server: HiveServer2) extends 
SparkListener {
+  class HiveThriftServer2Listener(
+  val server: HiveServer2,
+  val conf: SparkConf) extends SparkListener {
+
 override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
   server.stop()
 }
-  }
 
+val sessionList = new mutable.HashMap[String, SessionInfo]
+val executeList = new mutable.HashMap[String, ExecutionInfo]
+val retainedStatements =
+  conf.getInt("spark.thriftserver.ui.retainedStatements", 200)
+val retainedSessions =
+  conf.getInt("spark.thriftserver.ui.retainedSessions", 200)
+var totalRunning = 0
+
+override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
+  val jobGroup = for (
+props <- Option(jobStart.properties);
+statement <- 
Option(props.getProperty(SparkContext.SPARK_JOB_GROUP_ID))
+  ) yield statement
+
+  jobGroup.map( groupId => {
+val ret = executeList.find( _ match {
+  case (id: String, info: ExecutionInfo) => info.groupId == groupId
+})
+if (ret.isDefined) {
+  ret.get._2.jobId += jobStart.jobId.toString
+  ret.get._2.groupId = groupId
+}
+  })
+}
+
+def onSessionCreated(ip: String, sessionId: String, userName: String = 
"UNKNOWN"): Unit = {
+  val info = new SessionInfo(sessionId, System.currentTimeMillis, ip, 
userName)
+  sessionList(sessionId) = info
+  trimSessionIfNecessary
+}
+
+def onSessionClosed(sessionId: String): Unit = {
+  sessionList(sessionId).finishTimestamp = System.currentTimeMillis
+}
+
+def onStatementStart(
+id: String,
+sessionId: String,
+statement: String,
+groupId: String,
+userName: String = "UNKNOWN"): Unit = {
+  val info = new ExecutionInfo(statement, sessionId, 
System.currentTimeMillis, userName)
+  info.state = ExecutionState.STARTED
+  executeList(id) = info
+  trimExecutionIfNecessary
+  sessionList(sessionId).totalExecute += 1
+  executeList(id).groupId = groupId
+  totalRunning += 1
+}
+
+def onStatementParse(id: String, executePlan: String): Unit = {
+  executeList(id).executePlan = executePlan
+  executeList(id).state = ExecutionState.COMPILED
+}
+
+def onStatementError(id: String, errorMessage: String, errorTrace: 
String): Unit = {
+  executeList(id).finishTimestamp = System.currentTimeMillis
+  executeList(id).detail = errorMessage
+  executeList(id).state = ExecutionState.FAILED
+  totalRunning -= 1
+}
+
+def onStatementFinish(id: String): Unit = {
+  executeList(id).finishTimestamp = System.currentTimeMillis
+  executeList(id).state = ExecutionState.FINISHED
+  totalRunning -= 1
+}
+
+private d

1 2 3 4 5 6 7 8 >

1 - 100 of 736 matches

Mail list logo