date:20141201

[GitHub] spark pull request: [SPARK-4570][SQL]add BroadcastLeftSemiJoinHash

2014-12-01 Thread wangxiaojing

Github user wangxiaojing commented on the pull request:

https://github.com/apache/spark/pull/3442#issuecomment-65030850
  
@liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2014-12-01 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-65030925
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2014-12-01 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-65030922
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4397][Core] Cleanup 'import SparkContex...

2014-12-01 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3530#issuecomment-65030889
  
This is a really important fix, actually, since we ran into problems with 
IntelliJ's automatic import cleanup removing these: if we perform this import 
cleanup incrementally as part of other patches, then those patches will 
introduce build-breaks if they're cherry-picked into pre-1.2 versions of Spark. 
 As a result, it's much better to do all of this cleanup in one pass, as you've 
done here.

+1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2014-12-01 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-65030931
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-65031245
  
  [Test build #23979 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23979/consoleFull)
 for   PR 3519 at commit 
[`8f5daf9`](https://github.com/apache/spark/commit/8f5daf9072f23ef46102fe4419da5cf79212bc2f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4673][SQL] Optimizing limit using coale...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3531#issuecomment-65031244
  
  [Test build #23978 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23978/consoleFull)
 for   PR 3531 at commit 
[`681243a`](https://github.com/apache/spark/commit/681243aa2ae1ae804a033a5aded0bc8127f30e80).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-65031323
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23979/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-65031322
  
  [Test build #23979 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23979/consoleFull)
 for   PR 3519 at commit 
[`8f5daf9`](https://github.com/apache/spark/commit/8f5daf9072f23ef46102fe4419da5cf79212bc2f).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `sealed trait MonotonicityConstraint `
  * `class IsotonicRegressionModel(`
  * `case class WeightedLabeledPoint(label: Double, features: Vector, 
weight: Double = 1)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...

2014-12-01 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3480#issuecomment-65031620
  
@pwendell The example data do not need to be on the classpath. They are 
sample data files used by mllib examples, e.g., BinaryClassification, 
MovieLensALS. Usually the example code is the starting point for users. @srowen 
's change makes it easy to run exmaples:

1. download and unzip the distribution zip
2. run `bin/run-example mllib.DatasetExample`, which will read a file under 
`data/` by default.

The change looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-12-01 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r21075634
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala ---
@@ -197,4 +198,27 @@ class VectorsSuite extends FunSuite {
 assert(svMap.get(2) === Some(3.1))
 assert(svMap.get(3) === Some(0.0))
   }
+
+  test(vector p-norm) {
+val dv = Vectors.dense(0.0, -1.2, 3.1, 0.0, -4.5, 1.9)
+val sv = Vectors.sparse(6, Seq((1, -1.2), (2, 3.1), (3, 0.0), (4, 
-4.5), (5, 1.9)))
+
+assert(Vectors.norm(dv, 1.0) ~== dv.toArray.foldLeft(0.0)((a, v) =
+  a + math.abs(v)) relTol 1E-8)
+assert(Vectors.norm(sv, 1.0) ~== sv.toArray.foldLeft(0.0)((a, v) =
+  a + math.abs(v)) relTol 1E-8)
+
+assert(Vectors.norm(dv, 2.0) ~== 
math.sqrt(dv.toArray.foldLeft(0.0)((a, v) =
+  a + v * v)) relTol 1E-8)
+assert(Vectors.norm(sv, 2.0) ~== 
math.sqrt(sv.toArray.foldLeft(0.0)((a, v) =
+  a + v * v)) relTol 1E-8)
+
+assert(Vectors.norm(dv, Double.PositiveInfinity) ~== 
dv.toArray.map(math.abs).max relTol 1E-8)
+assert(Vectors.norm(sv, Double.PositiveInfinity) ~== 
sv.toArray.map(math.abs).max relTol 1E-8)
+
+assert(Vectors.norm(dv, 3.7) ~== math.pow(dv.toArray.foldLeft(0.0)((a, 
v) =
+  a + math.pow(math.abs(v), 3.7)), 1.0 / 3.7) relTol 1E-8)
+assert(Vectors.norm(sv, 3.7) ~== math.pow(dv.toArray.foldLeft(0.0)((a, 
v) =
--- End diff --

`dv` - `sv`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4674] Refactor getCallSite

2014-12-01 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/3532

[SPARK-4674] Refactor getCallSite

The current version of `getCallSite` visits the collection of 
`StackTraceElement` twice. However, it is unnecessary since we can perform our 
work with a single visit. We also do not need to keep filtered 
`StackTraceElement`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 refactor_getCallSite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3532.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3532


commit e7410177cf55b8e5f99fea844f8c3ed8035004e6
Author: Liang-Chi Hsieh vii...@gmail.com
Date:   2014-12-01T08:18:28Z

Refactor getCallSite.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4674] Refactor getCallSite

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3532#issuecomment-65032940
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-12-01 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r21075718
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error(Unsupported Breeze vector type:  + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p = 1.0)
+val values = vector match {
+  case dv: DenseVector = dv.values
+  case sv: SparseVector = sv.values
+  case v = throw new IllegalArgumentException(Do not support vector 
type  + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
--- End diff --

It is an interesting discussion ~ :) But maybe more people are familiar 
with the `if ... else if ... else` statement. And this is not on the critical 
path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix wrong file name pattern in .gitignore

2014-12-01 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3529#issuecomment-65033146
  
Thanks. I've merged this in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix wrong file name pattern in .gitignore

2014-12-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3529


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...

2014-12-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3480


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-12-01 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65033393
  
Yea as @aarondav pointed out, I don't think akka framesize is going to be a 
problem anymore in 1.2+, regardless of the number of partitions. Still good to 
have this check to be defensive. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-12-01 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65033421
  
Merging in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-12-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3527


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4662] [SQL] Whitelist more unittest

2014-12-01 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3522#issuecomment-65033588
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4661][Core] Minor code and docs cleanup

2014-12-01 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3521#issuecomment-65033664
  
Merging in master  branch-1.2. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4661][Core] Minor code and docs cleanup

2014-12-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3521


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4663][sql]add finally to avoid resource...

2014-12-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3526#discussion_r21076063
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
 ---
@@ -298,11 +298,15 @@ case class InsertIntoParquetTable(
   val committer = format.getOutputCommitter(hadoopContext)
   committer.setupTask(hadoopContext)
   val writer = format.getRecordWriter(hadoopContext)
-  while (iter.hasNext) {
-val row = iter.next()
-writer.write(null, row)
+  try {
+while (iter.hasNext) {
+  val row = iter.next()
+  writer.write(null, row)
+}
+  }
+  finally {
--- End diff --

can you put the finally on the same line as the previous } ? thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4662] [SQL] Whitelist more unittest

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3522#issuecomment-65034129
  
  [Test build #23980 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23980/consoleFull)
 for   PR 3522 at commit 
[`16fee22`](https://github.com/apache/spark/commit/16fee22d5294445e6ef46acc676780c18470c5fc).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor fix for doc and comment

2014-12-01 Thread scwf

GitHub user scwf opened a pull request:

https://github.com/apache/spark/pull/3533

[SQL] Minor fix for doc and comment



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scwf/spark sql-doc1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3533.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3533


commit 962910bbce3fed010985dca6d7fd6f538a5adff3
Author: wangfei wangf...@huawei.com
Date:   2014-12-01T08:41:59Z

doc and comment fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: 'Do not replicate streaming block when WAL is ...

2014-12-01 Thread jerryshao

GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/3534

'Do not replicate streaming block when WAL is enabled



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-4671

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3534.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3534


commit 500b45689d2cd6db2ec0a7e32949863dc973870a
Author: jerryshao saisai.s...@intel.com
Date:   2014-12-01T07:58:32Z

Do not replicate streaming block when WAL is enabled




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4570][SQL]add BroadcastLeftSemiJoinHash

2014-12-01 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3442#issuecomment-65034719
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4671][Streaming]Do not replicate stream...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3534#issuecomment-65035054
  
  [Test build #23981 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23981/consoleFull)
 for   PR 3534 at commit 
[`500b456`](https://github.com/apache/spark/commit/500b45689d2cd6db2ec0a7e32949863dc973870a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4570][SQL]add BroadcastLeftSemiJoinHash

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3442#issuecomment-65035043
  
  [Test build #23983 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23983/consoleFull)
 for   PR 3442 at commit 
[`3a63ecb`](https://github.com/apache/spark/commit/3a63ecb81aa02a02dc53d014ed3358927a95a376).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-12-01 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r21076434
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error(Unsupported Breeze vector type:  + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p = 1.0)
+val values = vector match {
+  case dv: DenseVector = dv.values
+  case sv: SparseVector = sv.values
+  case v = throw new IllegalArgumentException(Do not support vector 
type  + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
--- End diff --

yeah. but this will not work here unless p has type of `Int`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor fix for doc and comment

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3533#issuecomment-65035032
  
  [Test build #23982 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23982/consoleFull)
 for   PR 3533 at commit 
[`962910b`](https://github.com/apache/spark/commit/962910bbce3fed010985dca6d7fd6f538a5adff3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix wrong file name pattern in .gitignore

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3529#issuecomment-65035216
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23976/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix wrong file name pattern in .gitignore

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3529#issuecomment-65035212
  
  [Test build #23976 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23976/consoleFull)
 for   PR 3529 at commit 
[`de3c70a`](https://github.com/apache/spark/commit/de3c70acf34b8aa000f189a3f7731fe844377de7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Branch-1.2] [DOC] Date type in SQL programmin...

2014-12-01 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/3535

[Branch-1.2] [DOC] Date type in SQL programming guide



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark datedoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3535.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3535


commit 18ff1eddc145cf23d197da0c0b5c55d6ea2e7bd1
Author: Daoyuan Wang daoyuan.w...@intel.com
Date:   2014-12-01T08:48:18Z

[DOC] Date type




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-65035512
  
  [Test build #23984 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23984/consoleFull)
 for   PR 3462 at commit 
[`63c7165`](https://github.com/apache/spark/commit/63c71659ab7aa3bbea1a505f872dceeca5d3ab2f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Branch-1.2] [DOC] Date type in SQL programmin...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3535#issuecomment-65035797
  
  [Test build #23985 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23985/consoleFull)
 for   PR 3535 at commit 
[`18ff1ed`](https://github.com/apache/spark/commit/18ff1eddc145cf23d197da0c0b5c55d6ea2e7bd1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4673][SQL] Optimizing limit using coale...

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3531#issuecomment-65036523
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23978/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4673][SQL] Optimizing limit using coale...

2014-12-01 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3531#discussion_r21077028
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
@@ -148,20 +148,15 @@ case class Limit(limit: Int, child: SparkPlan)
   }
 
   override def execute() = {
-val rdd: RDD[_ : Product2[Boolean, Row]] = if (sortBasedShuffleOn) {
-  child.execute().mapPartitions { iter =
-iter.take(limit).map(row = (false, row.copy()))
+if (sortBasedShuffleOn) {
+  child.execute().map(_.copy).coalesce(1).mapPartitions { iter =
+iter.take(limit)
--- End diff --

Can we move the `map(_.copy)` after `take(limit)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4397][Core] Cleanup 'import SparkContex...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3530#issuecomment-65036726
  
  [Test build #23977 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23977/consoleFull)
 for   PR 3530 at commit 
[`04e2273`](https://github.com/apache/spark/commit/04e227382a6925f443da8794210faba0828f6f0d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4397][Core] Cleanup 'import SparkContex...

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3530#issuecomment-65036730
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23977/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4673][SQL] Optimizing limit using coale...

2014-12-01 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3531#discussion_r21077129
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
@@ -148,20 +148,15 @@ case class Limit(limit: Int, child: SparkPlan)
   }
 
   override def execute() = {
-val rdd: RDD[_ : Product2[Boolean, Row]] = if (sortBasedShuffleOn) {
-  child.execute().mapPartitions { iter =
-iter.take(limit).map(row = (false, row.copy()))
+if (sortBasedShuffleOn) {
--- End diff --

Probably we can ignore the `shortBasedShuffleOn` conditional checking. What 
do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4673][SQL] Optimizing limit using coale...

2014-12-01 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3531#discussion_r21077317
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
@@ -148,20 +148,15 @@ case class Limit(limit: Int, child: SparkPlan)
   }
 
   override def execute() = {
-val rdd: RDD[_ : Product2[Boolean, Row]] = if (sortBasedShuffleOn) {
-  child.execute().mapPartitions { iter =
-iter.take(limit).map(row = (false, row.copy()))
+if (sortBasedShuffleOn) {
--- End diff --

Refer to 
https://github.com/scwf/spark/commit/e2614038e78f4693fafedeee15b6fdf0ea1be473, 
seems ignore this will leads some problem


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4662] [SQL] Whitelist more unittest

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3522#issuecomment-65037572
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23980/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4662] [SQL] Whitelist more unittest

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3522#issuecomment-65037569
  
  [Test build #23980 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23980/consoleFull)
 for   PR 3522 at commit 
[`16fee22`](https://github.com/apache/spark/commit/16fee22d5294445e6ef46acc676780c18470c5fc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4673][SQL] Optimizing limit using coale...

2014-12-01 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3531#discussion_r21077554
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
@@ -148,20 +148,15 @@ case class Limit(limit: Int, child: SparkPlan)
   }
 
   override def execute() = {
-val rdd: RDD[_ : Product2[Boolean, Row]] = if (sortBasedShuffleOn) {
-  child.execute().mapPartitions { iter =
-iter.take(limit).map(row = (false, row.copy()))
+if (sortBasedShuffleOn) {
+  child.execute().map(_.copy).coalesce(1).mapPartitions { iter =
+iter.take(limit)
--- End diff --

Hmm, I will try this. Actually i am not clear why we need ```copy``` here, 
@rxin added it to fix a bug. Hi @rxin, can you explain this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4661][Core] Minor code and docs cleanup

2014-12-01 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3521#issuecomment-65039393
  
@zsxwing I know it's too late, but the cast should also have a 
`@SuppressWarnings(unchecked)`, ideally, to avoid another warnings. I have 
some things like this taken care of in another open PR: 
https://www.github.com/apache/spark/pull/3157


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4668] Fix some documentation typos.

2014-12-01 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3523#issuecomment-65040128
  
Not sure if it's easy, but most of the diff is inadvertent changes to 
whitespace at the end of lines. This makes it a little hard to see the changes 
you're making since they're not otherwise enumerated here or in the JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4661][Core] Minor code and docs cleanup

2014-12-01 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3521#issuecomment-65040116
  
 but the cast should also have a @SuppressWarnings(unchecked), ideally, 
to avoid another warnings. I have some things like this taken care of in 
another open PR:

@srowen, yes. Then it's better to add it in your PR :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-65041271
  
  [Test build #23986 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23986/consoleFull)
 for   PR 3222 at commit 
[`f5ab79e`](https://github.com/apache/spark/commit/f5ab79ebf8515cace31622dab32b1c4d33a35471).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-4675] Find similar products and ...

2014-12-01 Thread sbourke

GitHub user sbourke opened a pull request:

https://github.com/apache/spark/pull/3536

[MLLIB][SPARK-4675] Find similar products and similar users in 
MatrixFactorizationModel

Using the latent feature space that is learnt in MatrixFactorizationModel, 
I have added 2 new functions to find similar products and similar users. A user 
of the API can for example pass a product ID, and get the closest products 
based on the feature space.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sbourke/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3536.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3536


commit 956ca1b86aacb22fabd52740ce0c6fef5524bae8
Author: Senior Stefano El Bour-que steven.bou...@schibsted.es
Date:   2014-11-28T08:40:40Z

added functionality to find similar users and similar products

commit 12e6b6b3a2cbfa1baa29449396e7e85bed1dec56
Author: Steven Bourke steve@stevens-imac.local
Date:   2014-11-30T23:22:46Z

added unit test to make sure id isnt teh same




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-4675] Find similar products and ...

2014-12-01 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3536#discussion_r21078944
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
 ---
@@ -95,6 +95,35 @@ class MatrixFactorizationModel(
   }
 
   /**
+   * Recommends similar products
+   *
+   * @param user the user to find similar users for
+   * @param num how many products to return. The number returned may be 
less than this.
+   * @return [[Rating]] objects, each of which contains the given user ID, 
a user ID, and a
+   *  score in the rating field. Each represents one recommended user, 
and they are sorted
+   *  by score, decreasing. The first returned is the one predicted to be 
most similar
+   *  user to the specified user ID. The score is an opaque value that 
indicates how strongly
+   *  recommended the user is.
+   */
+  def recommendSimilariUsers(user: Int, num: Int): Array[Rating] =
--- End diff --

Typo: `Similari`, also below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4570][SQL]add BroadcastLeftSemiJoinHash

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3442#issuecomment-65041697
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23983/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4570][SQL]add BroadcastLeftSemiJoinHash

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3442#issuecomment-65041688
  
  [Test build #23983 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23983/consoleFull)
 for   PR 3442 at commit 
[`3a63ecb`](https://github.com/apache/spark/commit/3a63ecb81aa02a02dc53d014ed3358927a95a376).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class BroadcastLeftSemiJoinHash(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-4675] Find similar products and ...

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3536#issuecomment-65042076
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-4675] Find similar products and ...

2014-12-01 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3536#issuecomment-65042436
  
I think it's essential to explain (even in internal comments, or this PR) 
what the similarity metric is. It's just ranking by dot product, which makes it 
something like cosine similarity. The differences are that it isn't in [-1,1], 
and the result doesn't normalize away the length of the feature vectors. This 
tends to favor popular items, or mean that somewhat less similar items may rank 
higher because they're popular. I had traditionally viewed that as a negative, 
and preferred the more standard cosine similarity, but it's certainly up for 
debate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4671][Streaming]Do not replicate stream...

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3534#issuecomment-65043093
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23981/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4671][Streaming]Do not replicate stream...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3534#issuecomment-65043086
  
  [Test build #23981 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23981/consoleFull)
 for   PR 3534 at commit 
[`500b456`](https://github.com/apache/spark/commit/500b45689d2cd6db2ec0a7e32949863dc973870a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-65043512
  
  [Test build #23984 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23984/consoleFull)
 for   PR 3462 at commit 
[`63c7165`](https://github.com/apache/spark/commit/63c71659ab7aa3bbea1a505f872dceeca5d3ab2f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-65043520
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23984/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Branch-1.2] [DOC] Date type in SQL programmin...

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3535#issuecomment-65044316
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23985/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Branch-1.2] [DOC] Date type in SQL programmin...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3535#issuecomment-65044310
  
  [Test build #23985 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23985/consoleFull)
 for   PR 3535 at commit 
[`18ff1ed`](https://github.com/apache/spark/commit/18ff1eddc145cf23d197da0c0b5c55d6ea2e7bd1).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor fix for doc and comment

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3533#issuecomment-65045819
  
  [Test build #23982 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23982/consoleFull)
 for   PR 3533 at commit 
[`962910b`](https://github.com/apache/spark/commit/962910bbce3fed010985dca6d7fd6f538a5adff3).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor fix for doc and comment

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3533#issuecomment-65045823
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23982/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3575][SQL][WIP] Removes the Metastore P...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3441#issuecomment-65047207
  
  [Test build #23987 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23987/consoleFull)
 for   PR 3441 at commit 
[`630330a`](https://github.com/apache/spark/commit/630330afaae2dd1d10436cb4acb41b6da217f82b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-65050301
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23986/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-65050294
  
  [Test build #23986 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23986/consoleFull)
 for   PR 3222 at commit 
[`f5ab79e`](https://github.com/apache/spark/commit/f5ab79ebf8515cace31622dab32b1c4d33a35471).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DBN(val stackedRBM: StackedRBM, val nn: NN)`
  * `class NN(val innerLayers: Array[NNLayer])`
  * `class RBM(`
  * `class StackedRBM(val innerRBMs: Array[RBM])`
  * `case class MinstItem(label: Int, data: Array[Int]) `
  * `class MinstDatasetReader(labelsFile: String, imagesFile: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4663][sql]add finally to avoid resource...

2014-12-01 Thread baishuo

Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/3526#issuecomment-65053699
  
@rxin  no problem.  Had modify it :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3575][SQL][WIP] Removes the Metastore P...

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3441#issuecomment-65053872
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23987/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3575][SQL][WIP] Removes the Metastore P...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3441#issuecomment-65053865
  
  [Test build #23987 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23987/consoleFull)
 for   PR 3441 at commit 
[`630330a`](https://github.com/apache/spark/commit/630330afaae2dd1d10436cb4acb41b6da217f82b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4663][sql]add finally to avoid resource...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3526#issuecomment-65054273
  
  [Test build #23988 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23988/consoleFull)
 for   PR 3526 at commit 
[`b36bf96`](https://github.com/apache/spark/commit/b36bf96ed12d937a511d2292e424da10de8720c8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2208] fix zero shuffle wait time in fas...

2014-12-01 Thread XuefengWu

Github user XuefengWu commented on the pull request:

https://github.com/apache/spark/pull/3380#issuecomment-6502
  
@aarondav any more suggestion ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4672][GraphX]Non-transient PartitionsRD...

2014-12-01 Thread JerryLead

GitHub user JerryLead opened a pull request:

https://github.com/apache/spark/pull/3537

[SPARK-4672][GraphX]Non-transient PartitionsRDDs lead to StackOverflow error

The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672

In a nutshell, if `val partitionsRDD` of VertexRDD and EdgeRDD are 
non-transient, the task's serialization chain will become very long in 
iterative algorithms and finally lead to the StackOverflow error. More details 
and explanation can be found in the JIRA.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JerryLead/spark my_change

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3537.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3537


commit 52799e3ea2b22f4bcaec3d9cd4c8891e212be09e
Author: Lijie Xu csxuli...@gmail.com
Date:   2014-12-01T08:54:37Z

Merge pull request #1 from apache/master

update

commit 5207961636f41187109c2d71617f8aba7d277e07
Author: JerryLead jerryl...@163.com
Date:   2014-12-01T11:45:31Z

set VertexRDD.partitionsRDD and EdgeRDD.partitionsRDD to transient variables




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4672][GraphX]Non-transient PartitionsRD...

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3537#issuecomment-65056312
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3891][SQL] Add array support to percent...

2014-12-01 Thread gvramana

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/spark/pull/2802#discussion_r21086151
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala 
---
@@ -172,6 +177,8 @@ private[hive] case class 
HiveGenericUdf(functionClassName: String, children: Seq
 
   override def eval(input: Row): Any = {
 returnInspector // Make sure initialized.
+if(foldable) return constantReturnValue
--- End diff --

In HiveQuerySuite,  constant array testcase was failing
SELECT sort_array(
 sort_array(
   array(hadoop distributed file system,
 enterprise databases, hadoop map-reduce)))
   FROM src LIMIT 1

[info] - constant array *** FAILED *** (596 milliseconds)
[info]   Failed to execute query using catalyst:
[info]   Error: java.lang.String cannot be cast to org.apache.hadoop.io.Text


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3891][SQL] Add array support to percent...

2014-12-01 Thread gvramana

Github user gvramana commented on the pull request:

https://github.com/apache/spark/pull/2802#issuecomment-65059567
  
@marmbrus, @chenghao-intel any other comments? Can you merge the same, 
thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4672][GraphX]Non-transient PartitionsRD...

2014-12-01 Thread JerryLead

Github user JerryLead closed the pull request at:

https://github.com/apache/spark/pull/3537


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4663][sql]add finally to avoid resource...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3526#issuecomment-65060582
  
  [Test build #23988 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23988/consoleFull)
 for   PR 3526 at commit 
[`b36bf96`](https://github.com/apache/spark/commit/b36bf96ed12d937a511d2292e424da10de8720c8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4663][sql]add finally to avoid resource...

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3526#issuecomment-65060591
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23988/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4676] [SQL] JavaSchemaRDD.schema may th...

2014-12-01 Thread YanTangZhai

GitHub user YanTangZhai opened a pull request:

https://github.com/apache/spark/pull/3538

[SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if 
sql has null

val jsc = new org.apache.spark.api.java.JavaSparkContext(sc)
val jhc = new org.apache.spark.sql.hive.api.java.JavaHiveContext(jsc)
val nrdd = jhc.hql(select null from spark_test.for_test)
println(nrdd.schema)
Then the error is thrown as follows:
scala.MatchError: NullType (of class 
org.apache.spark.sql.catalyst.types.NullType$)
at 
org.apache.spark.sql.types.util.DataTypeConversions$.asJavaDataType(DataTypeConversions.scala:43)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/YanTangZhai/spark MatchNullType

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3538.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3538


commit cdef539abc5d2d42d4661373939bdd52ca8ee8e6
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-08-06T13:07:08Z

Merge pull request #1 from apache/master

update

commit cbcba66ad77b96720e58f9d893e87ae5f13b2a95
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-08-20T13:14:08Z

Merge pull request #3 from apache/master

Update

commit 8a0010691b669495b4c327cf83124cabb7da1405
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-09-12T06:54:58Z

Merge pull request #6 from apache/master

Update

commit 03b62b043ab7fd39300677df61c3d93bb9beb9e3
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-09-16T12:03:22Z

Merge pull request #7 from apache/master

Update

commit 76d40277d51f709247df1d3734093bf2c047737d
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-10-20T12:52:22Z

Merge pull request #8 from apache/master

update

commit d26d98248a1a4d0eb15336726b6f44e05dd7a05a
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-11-04T09:00:31Z

Merge pull request #9 from apache/master

Update

commit e249846d9b7967ae52ec3df0fb09e42ffd911a8a
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-11-11T03:18:24Z

Merge pull request #10 from apache/master

Update

commit 6e643f81555d75ec8ef3eb57bf5ecb6520485588
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-12-01T11:23:56Z

Merge pull request #11 from apache/master

Update

commit 896c7b73f0ba1b2d3dccf6fed6410bf077eb3d54
Author: yantangzhai tyz0...@163.com
Date:   2014-12-01T13:08:41Z

fix NullType MatchError in JavaSchemaRDD when sql has null




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4676] [SQL] JavaSchemaRDD.schema may th...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3538#issuecomment-65064263
  
  [Test build #23989 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23989/consoleFull)
 for   PR 3538 at commit 
[`896c7b7`](https://github.com/apache/spark/commit/896c7b73f0ba1b2d3dccf6fed6410bf077eb3d54).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Documentation: add description for repartition...

2014-12-01 Thread msiddalingaiah

Github user msiddalingaiah commented on the pull request:

https://github.com/apache/spark/pull/3390#issuecomment-65067064
  
OK, done. Please review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-65069007
  
  [Test build #23990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23990/consoleFull)
 for   PR 3519 at commit 
[`6046550`](https://github.com/apache/spark/commit/6046550e79af307e582ffaae559e56d46c884967).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4677] [WEB] Add hadoop input time in ta...

2014-12-01 Thread YanTangZhai

GitHub user YanTangZhai opened a pull request:

https://github.com/apache/spark/pull/3539

[SPARK-4677] [WEB] Add hadoop input time in task webui

Add hadoop input time in task webui like GC Time to explicitly show the 
time used by task to read input data.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/YanTangZhai/spark WebuiInputTime

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3539


commit cdef539abc5d2d42d4661373939bdd52ca8ee8e6
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-08-06T13:07:08Z

Merge pull request #1 from apache/master

update

commit cbcba66ad77b96720e58f9d893e87ae5f13b2a95
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-08-20T13:14:08Z

Merge pull request #3 from apache/master

Update

commit 8a0010691b669495b4c327cf83124cabb7da1405
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-09-12T06:54:58Z

Merge pull request #6 from apache/master

Update

commit 03b62b043ab7fd39300677df61c3d93bb9beb9e3
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-09-16T12:03:22Z

Merge pull request #7 from apache/master

Update

commit 76d40277d51f709247df1d3734093bf2c047737d
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-10-20T12:52:22Z

Merge pull request #8 from apache/master

update

commit d26d98248a1a4d0eb15336726b6f44e05dd7a05a
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-11-04T09:00:31Z

Merge pull request #9 from apache/master

Update

commit e249846d9b7967ae52ec3df0fb09e42ffd911a8a
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-11-11T03:18:24Z

Merge pull request #10 from apache/master

Update

commit 6e643f81555d75ec8ef3eb57bf5ecb6520485588
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-12-01T11:23:56Z

Merge pull request #11 from apache/master

Update

commit 3816f8540b947809cb821bcb3af36d7be0210d9c
Author: yantangzhai tyz0...@163.com
Date:   2014-12-01T14:09:24Z

add hadoop input read time in webui




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4642] Documents about running-on-YARN n...

2014-12-01 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/3500#issuecomment-65069892
  
@sryza is correct. Most of those were intentionally left undocumented.  If 
you have reason they need to be changed then we can revisit them sooner to make 
sure they are what we want and get them documented.  Note there is a different 
pr up to fix up spark.yarn.user.classpath.first 
(https://github.com/apache/spark/pull/3233)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4677] [WEB] Add hadoop input time in ta...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3539#issuecomment-65069992
  
  [Test build #23991 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23991/consoleFull)
 for   PR 3539 at commit 
[`3816f85`](https://github.com/apache/spark/commit/3816f8540b947809cb821bcb3af36d7be0210d9c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4677] [WEB] Add hadoop input time in ta...

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3539#issuecomment-65070129
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23991/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4677] [WEB] Add hadoop input time in ta...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3539#issuecomment-65070127
  
  [Test build #23991 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23991/consoleFull)
 for   PR 3539 at commit 
[`3816f85`](https://github.com/apache/spark/commit/3816f8540b947809cb821bcb3af36d7be0210d9c).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4677] [WEB] Add hadoop input time in ta...

2014-12-01 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3539#discussion_r21090555
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -238,10 +238,13 @@ class HadoopRDD[K, V](
   val value: V = reader.createValue()
 
   var recordsSinceMetricsUpdate = 0
+  var startTime : Long = 0L
 
   override def getNext() = {
 try {
+  startTime = System.nanoTime
   finished = !reader.next(key, value)
+  inputMetrics.readTime += (System.nanoTime - startTime)
--- End diff --

Hm, is this going to be expensive, making 2 system calls for every read?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4665] Improve YarnAllocator's parsing o...

2014-12-01 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/3525#issuecomment-65071045
  
 Perhaps I've missed it but I haven't heard a lot of cases for either way.  
Do you have examples or use cases?  I'd be open to changing it but want more 
reasoning behind it.   I've found putting in the value rather then a % easier 
in some cases that weren't small/straight forward jobs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [YARN][SPARK-3293]Fix yarn's web show SUCCEED...

2014-12-01 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/3508#discussion_r21090918
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -92,6 +104,57 @@ private[spark] abstract class YarnSchedulerBackend(
   }
 
   /**
+   * This system security manager applies to the entire process.
+   * It's main purpose is to handle the case if the user code does a 
System.exit.
+   * This allows us to catch that and properly set the YARN application 
status and
+   * cleanup if needed.
+   */
+  private def setupSystemSecurityManager(amActor: ActorRef): Unit = {
--- End diff --

The securityManager in the AM was causing a performance impact and we just 
removed it.  I expect the same issue to happen here. 
https://github.com/apache/spark/pull/3484


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4584] [yarn] Remove security manager fr...

2014-12-01 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/3484#issuecomment-65071338
  
@vanzin  I would still be curious if you have more details on the exact 
performance impact?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4676] [SQL] JavaSchemaRDD.schema may th...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3538#issuecomment-65072171
  
  [Test build #23989 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23989/consoleFull)
 for   PR 3538 at commit 
[`896c7b7`](https://github.com/apache/spark/commit/896c7b73f0ba1b2d3dccf6fed6410bf077eb3d54).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4676] [SQL] JavaSchemaRDD.schema may th...

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3538#issuecomment-65072175
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23989/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2014-12-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-65081643
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23990/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2014-12-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-65081631
  
  [Test build #23990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23990/consoleFull)
 for   PR 3519 at commit 
[`6046550`](https://github.com/apache/spark/commit/6046550e79af307e582ffaae559e56d46c884967).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `sealed trait MonotonicityConstraint `
  * `class IsotonicRegressionModel(`
  * `case class WeightedLabeledPoint(label: Double, features: Vector, 
weight: Double = 1)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2208] fix zero shuffle wait time in fas...

2014-12-01 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/3380#discussion_r21098192
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala ---
@@ -19,12 +19,19 @@ package org.apache.spark.scheduler
 
 import java.util.concurrent.Semaphore
 
+import akka.actor.ActorSystem
--- End diff --

nit: import ordering should abide by the [style 
guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2208] fix zero shuffle wait time in fas...

2014-12-01 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/3380#discussion_r21098231
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala ---
@@ -202,6 +209,60 @@ class SparkListenerSuite extends FunSuite with 
LocalSparkContext with Matchers
 stageInfo.rddInfos.forall(_.numPartitions == 4) should be {true}
   }
 
+  //SEE SPARK-2208: hack BlockManager to have a sleep when read shuffle 
data
--- End diff --

nit: space after //


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2208] fix zero shuffle wait time in fas...

2014-12-01 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/3380#discussion_r21098278
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala ---
@@ -202,6 +209,60 @@ class SparkListenerSuite extends FunSuite with 
LocalSparkContext with Matchers
 stageInfo.rddInfos.forall(_.numPartitions == 4) should be {true}
   }
 
+  //SEE SPARK-2208: hack BlockManager to have a sleep when read shuffle 
data
+  test(local metrics with fetchWaitTime) {
+val listener = new SaveStageAndTaskInfo
+val sc2 = new SparkContext(local, SparkListenerSuite2)
+
+val env = SparkEnv.get
+val bm: BlockManager = env.blockManager
+val numOfCore = Runtime.getRuntime().availableProcessors()
+val maxMemory = getMaxMemory(env.conf)
+
+val hackedBlockManager = new SlowBlockManager(env.executorId, 
env.actorSystem, bm.master,
+  env.serializer, maxMemory, env.conf, env.mapOutputTracker, 
env.shuffleManager, env.blockTransferService, env.securityManager,numOfCore)
--- End diff --

we have a max line length of 100ch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2208] fix zero shuffle wait time in fas...

2014-12-01 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/3380#discussion_r21098319
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala ---
@@ -202,6 +209,60 @@ class SparkListenerSuite extends FunSuite with 
LocalSparkContext with Matchers
 stageInfo.rddInfos.forall(_.numPartitions == 4) should be {true}
   }
 
+  //SEE SPARK-2208: hack BlockManager to have a sleep when read shuffle 
data
+  test(local metrics with fetchWaitTime) {
+val listener = new SaveStageAndTaskInfo
+val sc2 = new SparkContext(local, SparkListenerSuite2)
+
+val env = SparkEnv.get
+val bm: BlockManager = env.blockManager
+val numOfCore = Runtime.getRuntime().availableProcessors()
+val maxMemory = getMaxMemory(env.conf)
+
+val hackedBlockManager = new SlowBlockManager(env.executorId, 
env.actorSystem, bm.master,
+  env.serializer, maxMemory, env.conf, env.mapOutputTracker, 
env.shuffleManager, env.blockTransferService, env.securityManager,numOfCore)
+
+
+val hackEnv = new SparkEnv(env.executorId, env.actorSystem, 
env.serializer, env.closureSerializer, env.cacheManager, env.mapOutputTracker,
--- End diff --

line length issue here too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 498 matches

Mail list logo