[GitHub] spark pull request: [SPARK-7097][SQL]: Partitioned tables should o...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5668#issuecomment-95817298
  
  [Test build #30919 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30919/consoleFull)
 for   PR 5668 at commit 
[`b4651fd`](https://github.com/apache/spark/commit/b4651fd80a55f016093d84cf3b00ad6c91333cef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

2015-04-24 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/5680

[SPARK-7112][Streaming] Add a DirectStreamTracker to track the direct 
streams



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-7111

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5680


commit 28d668faf51495e779aa1f874ceb03a64bccf410
Author: jerryshao saisai.s...@intel.com
Date:   2015-04-24T06:07:54Z

Add DirectStreamTracker to track the direct streams




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5680#issuecomment-95819308
  
  [Test build #30920 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30920/consoleFull)
 for   PR 5680 at commit 
[`28d668f`](https://github.com/apache/spark/commit/28d668faf51495e779aa1f874ceb03a64bccf410).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6122][Core] Upgrade tachyon-client vers...

2015-04-24 Thread aniketbhatnagar
Github user aniketbhatnagar commented on the pull request:

https://github.com/apache/spark/pull/5354#issuecomment-95819955
  
+1 from my side. having a consistent httpclient version would be so much 
better!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-95821459
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30916/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-95821427
  
  [Test build #30916 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30916/consoleFull)
 for   PR 2342 at commit 
[`d3c63c8`](https://github.com/apache/spark/commit/d3c63c84a56041756841dd0706d87c8c808e84d3).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class ExecutorUIData(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...

2015-04-24 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/5676#issuecomment-95822131
  
This looks like a duplicate of SPARK-6954 (PR #5536)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-24 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/5645#discussion_r29026826
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala
 ---
@@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
 logDebug(sRead partition data of $this from block manager, block 
$blockId)
 iterator
   case None = // Data not found in Block Manager, grab it from write 
ahead log file
-val reader = new WriteAheadLogRandomReader(partition.segment.path, 
hadoopConf)
-val dataRead = reader.read(partition.segment)
-reader.close()
+var dataRead: ByteBuffer = null
+var writeAheadLog: WriteAheadLog = null
+try {
+  val dummyDirectory = FileUtils.getTempDirectoryPath()
--- End diff --

Why here need to use `dummyDirectory`? Assuming WAL may not be file-based, 
so I'm not sure what's the meaning we need to have this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-04-24 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/4723#issuecomment-95846092
  
Looks almost good, except the comments on the API. Other than that, i took 
a detailed pass on everything else and it looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7098][SQL] Make the WHERE clause with t...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5682#issuecomment-95846148
  
  [Test build #30925 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30925/consoleFull)
 for   PR 5682 at commit 
[`4e98520`](https://github.com/apache/spark/commit/4e98520e78832b25877d825392d66d10779281f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5643#issuecomment-95860808
  
  [Test build #30924 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30924/consoleFull)
 for   PR 5643 at commit 
[`90a69ec`](https://github.com/apache/spark/commit/90a69ec603279442c5a0b3e510e8f5db9e1bbb80).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [PySpark][Minor] Update sql example, so that c...

2015-04-24 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/5684

[PySpark][Minor] Update sql example, so that can read file correctly

To run Spark, default will read file from HDFS if we don't set the schema.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark pyspark_example_minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5684.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5684


commit 19fe145e7a00574080b91d311376b6d2cdb4254e
Author: linweizhong linweizh...@huawei.com
Date:   2015-04-24T09:16:23Z

Update example sql.py, so that can read file correctly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4705:[core] Write event logs of differen...

2015-04-24 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4845#issuecomment-95896735
  
It looks like this work is being continued in 
https://github.com/apache/spark/pull/5432 which is currently more active. Do 
you mind closing this PR and focusing discussion on that PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor][MLLIB] Fix a formatting bug in toStrin...

2015-04-24 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5687#discussion_r29041932
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/Node.scala 
---
@@ -51,8 +51,8 @@ class Node (
 var stats: Option[InformationGainStats]) extends Serializable with 
Logging {
 
   override def toString: String = {
-id =  + id + , isLeaf =  + isLeaf + , predict =  + predict + , 
 +
--- End diff --

These can use string interpolation. I take your point though it breaks the 
symmetry a bit and make this `toString` rely on details of the subclass. How 
about making the `Predict.toString` return something more compact like 
`s$predict ($prob)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-24 Thread MechCoder
Github user MechCoder commented on the pull request:

https://github.com/apache/spark/pull/5467#issuecomment-95910746
  
cool, will make the changes along with sprak-7045


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7115]][MLLIB] skip the very first 1 in ...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5681#issuecomment-95841203
  
  [Test build #30922 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30922/consoleFull)
 for   PR 5681 at commit 
[`9ac27cd`](https://github.com/apache/spark/commit/9ac27cd5856205a5e316e1679bdd39200d4c3ede).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5680#issuecomment-95845825
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30920/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7098][SQL] Make the WHERE clause with t...

2015-04-24 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/5682

[SPARK-7098][SQL] Make the WHERE clause with timestamp show consistent 
result

JIRA: https://issues.apache.org/jira/browse/SPARK-7098

The WHERE clause with timstamp shows inconsistent results. This pr fixes it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 consistent_timestamp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5682.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5682


commit 4e98520e78832b25877d825392d66d10779281f7
Author: Liang-Chi Hsieh vii...@gmail.com
Date:   2015-04-24T08:07:44Z

Make the WHERE clause with timestamp show consistent result.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5266#issuecomment-95851114
  
  [Test build #30927 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30927/consoleFull)
 for   PR 5266 at commit 
[`741db31`](https://github.com/apache/spark/commit/741db31f112469141a22634a406ab20feb13e678).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7115]][MLLIB] skip the very first 1 in ...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5681#issuecomment-95868782
  
  [Test build #30922 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30922/consoleFull)
 for   PR 5681 at commit 
[`9ac27cd`](https://github.com/apache/spark/commit/9ac27cd5856205a5e316e1679bdd39200d4c3ede).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class PolynomialExpansion extends UnaryTransformer[Vector, Vector, 
PolynomialExpansion] `

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6568] spark-shell.cmd --jars option doe...

2015-04-24 Thread tsudukim
Github user tsudukim commented on a diff in the pull request:

https://github.com/apache/spark/pull/5447#discussion_r29036669
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala ---
@@ -82,7 +82,7 @@ object PythonRunner {
 sspark-submit is currently only supported for local files: $path)
 }
 val windows = Utils.isWindows || testWindows
-var formattedPath = if (windows) Utils.formatWindowsPath(path) else 
path
+var formattedPath = Utils.formatPath(path, windows)
--- End diff --

That's right. I'll try to remove them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5738] [SQL] Reuse mutable row for each ...

2015-04-24 Thread yanboliang
Github user yanboliang closed the pull request at:

https://github.com/apache/spark/pull/4527


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [PYSPARK] Add percentile method in rdd as nump...

2015-04-24 Thread AiHe
GitHub user AiHe opened a pull request:

https://github.com/apache/spark/pull/5686

[PYSPARK] Add percentile method in rdd as numpy

1. Add percentile method in rdd
2. By default, get the kth percentile element from bottom(ascending
order)
3. By specifying key, it can return top or even user-defined kth
percentile element
4. Tested it

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/AiHe/spark percentile

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5686.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5686


commit 1403816f8287aeee316b27ba569ce607fdb0ed2c
Author: Alain a...@usc.edu
Date:   2015-04-24T10:24:51Z

[PYSPARK] Add percentile method in rdd as  numpy

1. Add percentile method in rdd
2. By default, get the kth percentile element from bottom(ascending
order)
3. By specifying key, it can return top or even user-defined kth
percentile element
4. Tested it




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-24 Thread shenh062326
Github user shenh062326 commented on the pull request:

https://github.com/apache/spark/pull/5608#issuecomment-95904531
  
@srowen 
The last assertResult I have add in the testcase is the case that can't 
only discarding the first non-null sample, because half of the array elems are 
not link to the shared object, if the first non-null sample (which generate by 
random) is not link to the shared object, we can't exclude the shared object. 
But if we sampling twice, even if the twice has not exclude the shared object, 
it can also work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-95905239
  
  [Test build #30930 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30930/consoleFull)
 for   PR 5685 at commit 
[`2390a60`](https://github.com/apache/spark/commit/2390a608ed74a9703d3763d040421dccb51242ec).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `   *   class SomethingNotSerializable `
  * `  logDebug(s + cloning the object $obj of class $`
  * `class FieldAccessFinder(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5726] [MLLIB] Elementwise (Hadamard) Ve...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4580#issuecomment-95841209
  
  [Test build #30923 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30923/consoleFull)
 for   PR 4580 at commit 
[`e7ff5f2`](https://github.com/apache/spark/commit/e7ff5f2cc3c172b97c6ea3cec6ebf7546682a74b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-24 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5608#issuecomment-95905420
  
@shenh062326 yes, but you constructed it that way. I can construct a case 
that works and doesn't work for any sampling strategy. The question is, what is 
the common case? I'm pretty sure it's that all N objects share some common data 
structure, which sampling just 1 would catch.

However if you want to go this way, at least generalize it. There is 
nothing magic about 2 samples, so it shouldn't be written that way with a 
redundant loop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7092] Update spark scala version to 2.1...

2015-04-24 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/5662#discussion_r29041655
  
--- Diff: 
repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala ---
@@ -1129,7 +1129,7 @@ class SparkIMain(@BeanProperty val factory: 
ScriptEngineFactory, initialSettings
 
 def apply(line: String): Result = debugging(sparse($line))  {
   var isIncomplete = false
-  currentRun.reporting.withIncompleteHandler((_, _) = isIncomplete = 
true) {
+  currentRun.parsing.withIncompleteHandler((_, _) = isIncomplete = 
true) {
--- End diff --

It is harmless, if not beneficial. But this can be a stepping stone towards 
enabling add jars, because then we have two options 1) back port scala's 
version of add jars 2) port spark scala 2.10 repl's version of add jars on the 
fly.

Without this patch such a option does not exist. I agree best thing is to 
patch scala/scala so that we donot have to do it. So I am working on it, with 
whatever time I have.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5687][Core]TaskResultGetter needs to ca...

2015-04-24 Thread lianhuiwang
Github user lianhuiwang commented on the pull request:

https://github.com/apache/spark/pull/4474#issuecomment-95907922
  
@pwendell  i think we cannot kill JVM directly when this occurs. when it is 
hive server that one driver for many jobs, if we kill JVM, other jobs on this 
driver cannot continue. i think this pr is ok that just abort this job and then 
DAGScheduler will throw a jobFailed exception to client.  if it is hive server, 
then hive server can catch this exception and continue to run other jobs. if it 
is a application that i said, user application donot catch this exception and 
throw this to applicationMaster, then application will be failed. so that can 
ensure that is be right for any situations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7022][PySpark][ML] Add ML.Tuning.ParamG...

2015-04-24 Thread oefirouz
Github user oefirouz commented on the pull request:

https://github.com/apache/spark/pull/5601#issuecomment-95857492
  
Friendly bump for more comments :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5643#issuecomment-95860872
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30924/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-04-24 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-95879508
  
@chenghao-intel ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6304][Streaming] Fix checkpointing does...

2015-04-24 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5060#discussion_r29040627
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -94,6 +94,11 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   // contains a map from hostname to a list of input format splits on the 
host.
   private[spark] var preferredNodeLocationData: Map[String, 
Set[SplitInfo]] = Map()
 
+  // This is used for Spark Streaming to check whether driver host and 
port are set by user,
+  // if these two configurations are set by user, so the recovery 
mechanism should not remove this.
+  private[spark] val isDriverHostSetByUser = 
config.contains(spark.driver.host)
--- End diff --

It doesn't seem worth tacking on yet more little fields in `SparkContext` 
just for a niche use case in a submodule. Use the config object in `Checkpoint`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5608#issuecomment-95901582
  
  [Test build #30931 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30931/consoleFull)
 for   PR 5608 at commit 
[`a9fca84`](https://github.com/apache/spark/commit/a9fca8444d7a8591032383a7d6ced84ee1f66a56).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-24 Thread shenh062326
Github user shenh062326 commented on the pull request:

https://github.com/apache/spark/pull/5608#issuecomment-95908120
  
Sampling strategy not always works, but sampling twice are more effective 
then only discarding the first non-null sample. And sampling 200 times  will 
not cause performance issues. 
If you think the code shouldn't written like that, I aggree, I will change 
it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5680#issuecomment-95845816
  
  [Test build #30920 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30920/consoleFull)
 for   PR 5680 at commit 
[`28d668f`](https://github.com/apache/spark/commit/28d668faf51495e779aa1f874ceb03a64bccf410).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5604#issuecomment-95852527
  
  [Test build #30928 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30928/consoleFull)
 for   PR 5604 at commit 
[`d07101b`](https://github.com/apache/spark/commit/d07101bd5a6f3b30532c4d4d77ab8d310607b684).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-24 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/5467#issuecomment-95852537
  
@MechCoder Sorry for my late comment! I made some minor comments. It would 
be good if you can submit a follow-up PR to address those issues. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5467#discussion_r29032437
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -429,7 +429,36 @@ class Word2Vec extends Serializable with Logging {
  */
 @Experimental
 class Word2VecModel private[mllib] (
-private val model: Map[String, Array[Float]]) extends Serializable 
with Saveable {
+model: Map[String, Array[Float]]) extends Serializable with Saveable {
+
+  // wordList: Ordered list of words obtained from model.
+  private val wordList: Array[String] = model.keys.toArray
+
+  // wordIndex: Maps each word to an index, which can retrieve the 
corresponding
+  //vector from wordVectors (see below).
+  private val wordIndex: Map[String, Int] = wordList.zip(0 until 
model.size).toMap
--- End diff --

`wordList.zipWithIndex.toMap`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7118] [Python] Add the coalesce Spark S...

2015-04-24 Thread ogirardot
GitHub user ogirardot opened a pull request:

https://github.com/apache/spark/pull/5683

[SPARK-7118] [Python] Add the coalesce Spark SQL function available in 
PySpark

This patch adds a proxy call from PySpark to the Spark SQL coalesce 
function and this patch comes out of a discussion on dev@spark with @rxin 

This contribution is my original work and i license the work to the project 
under the project's open source license.

Olivier.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ogirardot/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5683.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5683


commit e3fec1e76eaadf0aefaf16a0b935765858287f33
Author: Olivier Girardot o.girar...@lateral-thoughts.com
Date:   2015-04-24T08:39:32Z

SPARK-7118 Add the coalesce Spark SQL function available in PySpark

No changes to the scala/java part, only changes in Python.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4233] [SQL] [WIP] UDAF Interface Refact...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5542#issuecomment-95870914
  
  [Test build #30921 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30921/consoleFull)
 for   PR 5542 at commit 
[`6b594f0`](https://github.com/apache/spark/commit/6b594f05ef2725aa5f6bed716dbac6eed64a1879).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait AggregateFunction2 `
  * `trait AggregateExpression2 extends Expression with AggregateFunction2 `
  * `abstract class UnaryAggregateExpression extends UnaryExpression with 
AggregateExpression2 `
  * `case class Min(child: Expression) extends UnaryAggregateExpression `
  * `case class Average(child: Expression, distinct: Boolean = false)`
  * `case class Max(child: Expression) extends UnaryAggregateExpression `
  * `case class Count(child: Expression)`
  * `case class CountDistinct(children: Seq[Expression])`
  * `case class Sum(child: Expression, distinct: Boolean = false)`
  * `case class First(child: Expression, distinct: Boolean = false)`
  * `case class Last(child: Expression, distinct: Boolean = false)`
  * `class AggregateExpressionSubsitution `
  * `  class HashAggregation2(aggrSubsitution: 
AggregateExpressionSubsitution) extends Strategy `
  * `sealed class BufferSeens(var buffer: MutableRow, var seens: 
Array[JSet[Any]] = null) `
  * `sealed class BufferAndKey(leftLen: Int, rightLen: Int)`
  * `sealed trait Aggregate `
  * `sealed trait PostShuffle extends Aggregate `
  * `case class AggregatePreShuffle(`
  * `case class AggregatePostShuffle(`
  * `case class DistinctAggregate(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4233] [SQL] [WIP] UDAF Interface Refact...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5542#issuecomment-95870924
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30921/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

2015-04-24 Thread andrewor14
GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/5685

[SPARK-7120][SPARK-7121][WIP] Closure cleaner nesting + documentation

For instance, in SparkContext, I tried to do the following:
{code}
def scope[T](body: = T): T = body // no-op
def myCoolMethod(path: String): RDD[String] = scope {
  parallelize(1 to 10).map { _ = path }
}
{code}
and I got an exception complaining that SparkContext is not serializable. 
The issue here is that the inner closure is getting its path from the outer 
closure (the scope), but the outer closure actually references the SparkContext 
object itself to get the `parallelize` method.

Note, however, that the inner closure doesn't actually need the 
SparkContext; it just needs a field from the outer closure. If we modify 
ClosureCleaner to clean the outer closure recursively while using the fields 
accessed by the inner closure, then we can serialize the inner closure.

This is blocking my effort on a separate task. This is WIP because I plan 
to add tests for this later.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark closure-cleaner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5685.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5685


commit 86f78237b7623e4efa06c5feb053e0c304979c73
Author: Andrew Or and...@databricks.com
Date:   2015-04-24T10:05:58Z

Implement transitive cleaning + add missing documentation

See in-code comments for more detail on what this means.

commit 2390a608ed74a9703d3763d040421dccb51242ec
Author: Andrew Or and...@databricks.com
Date:   2015-04-24T10:08:11Z

Feature flag this new behavior

... in case anything breaks, we should be able to resort to old
behavior.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor][MLLIB] Fix a formatting bug in toStrin...

2015-04-24 Thread AiHe
GitHub user AiHe opened a pull request:

https://github.com/apache/spark/pull/5687

[Minor][MLLIB] Fix a formatting bug in toString method in Node

1. predict(predict.toString) has already output prefix “predict” thus 
it’s duplicated to print , predict =  again
2. there are some extra spaces

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/AiHe/spark tree-node-issue-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5687.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5687


commit 426eee7fa00eef343d10396704f0619e802841bc
Author: Alain a...@usc.edu
Date:   2015-04-24T09:26:03Z

[Minor][MLLIB] Fix a formatting bug in toString method in Node.scala

1. predict(predict.toString) has already output prefix “predict” thus
it’s
duplicate to print , predict =  again
2. there are some extra spaces




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6304][Streaming] Fix checkpointing does...

2015-04-24 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5060#discussion_r29040703
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala ---
@@ -41,12 +41,12 @@ class Checkpoint(@transient ssc: StreamingContext, val 
checkpointTime: Time)
   val checkpointDuration = ssc.checkpointDuration
   val pendingTimes = ssc.scheduler.getPendingTimes().toArray
   val delaySeconds = MetadataCleaner.getDelaySeconds(ssc.conf)
-  val sparkConfPairs = ssc.conf.getAll
+  val sparkConfPairs = ssc.conf.getAll.filterNot { kv =
--- End diff --

Maybe this can be turned into a generic function that removes given keys if 
the key is set in the config.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-04-24 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/4723#discussion_r29031149
  
--- Diff: python/pyspark/streaming/kafka.py ---
@@ -70,7 +71,195 @@ def createStream(ssc, zkQuorum, groupId, topics, 
kafkaParams={},
 except Py4JJavaError, e:
 # TODO: use --jar once it also work on driver
 if 'ClassNotFoundException' in str(e.java_exception):
-print 
+KafkaUtils._printErrorMsg(ssc.sparkContext)
+raise e
+ser = PairDeserializer(NoOpSerializer(), NoOpSerializer())
+stream = DStream(jstream, ssc, ser)
+return stream.map(lambda (k, v): (keyDecoder(k), valueDecoder(v)))
+
+@staticmethod
+def createDirectStream(ssc, topics, kafkaParams,
+   keyDecoder=utf8_decoder, 
valueDecoder=utf8_decoder):
+
+.. note:: Experimental
+
+Create an input stream that directly pulls messages from a Kafka 
Broker.
+
+This is not a receiver based Kafka input stream, it directly pulls 
the message from Kafka
+in each batch duration and processed without storing.
+
+This does not use Zookeeper to store offsets. The consumed offsets 
are tracked
+by the stream itself. For interoperability with Kafka monitoring 
tools that depend on
+Zookeeper, you have to update Kafka/Zookeeper yourself from the 
streaming application.
+You can access the offsets used in each batch from the generated 
RDDs (see
+
+To recover from driver failures, you have to enable checkpointing 
in the StreamingContext.
+The information on consumed offset can be recovered from the 
checkpoint.
+See the programming guide for details (constraints, etc.).
+
+:param ssc:  StreamingContext object
+:param topics:  list of topic_name to consume.
+:param kafkaParams: Additional params for Kafka
+:param keyDecoder:  A function used to decode key (default is 
utf8_decoder)
+:param valueDecoder:  A function used to decode value (default is 
utf8_decoder)
+:return: A DStream object
+
+if not isinstance(topics, list):
+raise TypeError(topics should be list)
+if not isinstance(kafkaParams, dict):
+raise TypeError(kafkaParams should be dict)
+
+jtopics = SetConverter().convert(topics, 
ssc.sparkContext._gateway._gateway_client)
+jparam = MapConverter().convert(kafkaParams, 
ssc.sparkContext._gateway._gateway_client)
+
+try:
+helperClass = 
ssc._jvm.java.lang.Thread.currentThread().getContextClassLoader() \
+
.loadClass(org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper)
+helper = helperClass.newInstance()
+jstream = helper.createDirectStream(ssc._jssc, jparam, jtopics)
+except Py4JJavaError, e:
+if 'ClassNotFoundException' in str(e.java_exception):
+KafkaUtils._printErrorMsg(ssc.sparkContext)
+raise e
+
+ser = PairDeserializer(NoOpSerializer(), NoOpSerializer())
+stream = DStream(jstream, ssc, ser)
+return stream.map(lambda (k, v): (keyDecoder(k), valueDecoder(v)))
+
+@staticmethod
+def createDirectStreamFromOffset(ssc, kafkaParams, fromOffsets,
+ keyDecoder=utf8_decoder, 
valueDecoder=utf8_decoder):
+
+.. note:: Experimental
+
+Create an input stream that directly pulls messages from a Kafka 
Broker and specific offset.
+
+This is not a receiver based Kafka input stream, it directly pulls 
the message from Kafka
+in each batch duration and processed without storing.
+
+This does not use Zookeeper to store offsets. The consumed offsets 
are tracked
+by the stream itself. For interoperability with Kafka monitoring 
tools that depend on
+Zookeeper, you have to update Kafka/Zookeeper yourself from the 
streaming application.
+You can access the offsets used in each batch from the generated 
RDDs (see
+
+To recover from driver failures, you have to enable checkpointing 
in the StreamingContext.
+The information on consumed offset can be recovered from the 
checkpoint.
+See the programming guide for details (constraints, etc.).
+
+:param ssc:  StreamingContext object.
+:param kafkaParams: Additional params for Kafka.
+:param fromOffsets: Per-topic/partition Kafka offsets defining the 
(inclusive) starting
+point of the stream.
+:param 

[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5643#issuecomment-95844789
  
  [Test build #30924 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30924/consoleFull)
 for   PR 5643 at commit 
[`90a69ec`](https://github.com/apache/spark/commit/90a69ec603279442c5a0b3e510e8f5db9e1bbb80).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5266#issuecomment-95878935
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30927/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5266#issuecomment-95878907
  
  [Test build #30927 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30927/consoleFull)
 for   PR 5266 at commit 
[`741db31`](https://github.com/apache/spark/commit/741db31f112469141a22634a406ab20feb13e678).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `final class IDF extends Estimator[IDFModel] with IDFBase `

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [PYSPARK] Add percentile method in rdd as nump...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5686#issuecomment-95883976
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-24 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5608#issuecomment-95904145
  
Sampling 1 might work for some cases; 100 for others; some may take 1000. 
There's no way to know. 

This change as it stands is needlessly complex because it duplicates the 
loop among other things. You just want to take n samples of the array, and use 
the largest as your base, and smallest as your multiplier. That would be OK, 
and make n some small number like 2 or 3. At least it would be less hard-coded, 
and would make for a better change, along with some comments about why you are 
doing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7097][SQL]: Partitioned tables should o...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5668#issuecomment-95843846
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30919/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7115]][MLLIB] skip the very first 1 in ...

2015-04-24 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/5681#issuecomment-95846544
  
LGTM if you do not want to set it as a parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [PySpark][Minor] Update sql example, so that c...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5684#issuecomment-95877543
  
  [Test build #30929 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30929/consoleFull)
 for   PR 5684 at commit 
[`19fe145`](https://github.com/apache/spark/commit/19fe145e7a00574080b91d311376b6d2cdb4254e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor][MLLIB] Fix a formatting bug in toStrin...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5687#issuecomment-95884593
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-7103: Fix crash with SparkContext.union ...

2015-04-24 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5679#discussion_r29041189
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1055,7 +1055,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   /** Build the union of a list of RDDs. */
   def union[T: ClassTag](rdds: Seq[RDD[T]]): RDD[T] = {
 val partitioners = rdds.flatMap(_.partitioner).toSet
-if (partitioners.size == 1) {
+if (rdds.forall(_.partitioner.isDefined)  partitioners.size == 1) {
--- End diff --

Yeah I like this. I suppose that the pre-existing condition already caught 
the empty RDD case, which `PartitionerAwareUnionRDD` will reject. Although 
symmetry between this check and the following one would be nice I don't think 
it's important. This looks correct since clearly `PartitionerAwareUnionRDD` 
intends to operate only on RDDs with partitioners.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

2015-04-24 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-95906708
  
You know a lot more about this than I, but I was under the impression that 
the closure cleaner couldn't clean beyond a level or so because it would then 
be modifying local object state by nulling fields in them and that's not 
necessarily permissible. I'm sure you're on top of that, just noting my 
recollection from similar discussions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-04-24 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/4723#discussion_r29031104
  
--- Diff: python/pyspark/streaming/kafka.py ---
@@ -70,7 +71,195 @@ def createStream(ssc, zkQuorum, groupId, topics, 
kafkaParams={},
 except Py4JJavaError, e:
 # TODO: use --jar once it also work on driver
 if 'ClassNotFoundException' in str(e.java_exception):
-print 
+KafkaUtils._printErrorMsg(ssc.sparkContext)
+raise e
+ser = PairDeserializer(NoOpSerializer(), NoOpSerializer())
+stream = DStream(jstream, ssc, ser)
+return stream.map(lambda (k, v): (keyDecoder(k), valueDecoder(v)))
+
+@staticmethod
+def createDirectStream(ssc, topics, kafkaParams,
+   keyDecoder=utf8_decoder, 
valueDecoder=utf8_decoder):
+
+.. note:: Experimental
+
+Create an input stream that directly pulls messages from a Kafka 
Broker.
+
+This is not a receiver based Kafka input stream, it directly pulls 
the message from Kafka
+in each batch duration and processed without storing.
+
+This does not use Zookeeper to store offsets. The consumed offsets 
are tracked
+by the stream itself. For interoperability with Kafka monitoring 
tools that depend on
+Zookeeper, you have to update Kafka/Zookeeper yourself from the 
streaming application.
+You can access the offsets used in each batch from the generated 
RDDs (see
+
+To recover from driver failures, you have to enable checkpointing 
in the StreamingContext.
+The information on consumed offset can be recovered from the 
checkpoint.
+See the programming guide for details (constraints, etc.).
+
+:param ssc:  StreamingContext object
+:param topics:  list of topic_name to consume.
+:param kafkaParams: Additional params for Kafka
+:param keyDecoder:  A function used to decode key (default is 
utf8_decoder)
+:param valueDecoder:  A function used to decode value (default is 
utf8_decoder)
+:return: A DStream object
+
+if not isinstance(topics, list):
+raise TypeError(topics should be list)
+if not isinstance(kafkaParams, dict):
+raise TypeError(kafkaParams should be dict)
+
+jtopics = SetConverter().convert(topics, 
ssc.sparkContext._gateway._gateway_client)
+jparam = MapConverter().convert(kafkaParams, 
ssc.sparkContext._gateway._gateway_client)
+
+try:
+helperClass = 
ssc._jvm.java.lang.Thread.currentThread().getContextClassLoader() \
+
.loadClass(org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper)
+helper = helperClass.newInstance()
+jstream = helper.createDirectStream(ssc._jssc, jparam, jtopics)
+except Py4JJavaError, e:
+if 'ClassNotFoundException' in str(e.java_exception):
+KafkaUtils._printErrorMsg(ssc.sparkContext)
+raise e
+
+ser = PairDeserializer(NoOpSerializer(), NoOpSerializer())
+stream = DStream(jstream, ssc, ser)
+return stream.map(lambda (k, v): (keyDecoder(k), valueDecoder(v)))
+
+@staticmethod
+def createDirectStreamFromOffset(ssc, kafkaParams, fromOffsets,
--- End diff --

I thought about this a little bit. But I think we should follow the 
precedent set by the `createStream` and other Python API where there is only 
method, with many optional parameters. So instead of having 
`createDirectStream` and `createDirectStreamFromOffsets`, lets just have 
`createDirectStream` with another optional parameter `fromOffsets`. 
`fromOffsets` should have the same keys as in topics, otherwise throw an error. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7097][SQL]: Partitioned tables should o...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5668#issuecomment-95843799
  
  [Test build #30919 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30919/consoleFull)
 for   PR 5668 at commit 
[`b4651fd`](https://github.com/apache/spark/commit/b4651fd80a55f016093d84cf3b00ad6c91333cef).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6304][Streaming] Fix checkpointing does...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5060#issuecomment-95847182
  
  [Test build #30926 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30926/consoleFull)
 for   PR 5060 at commit 
[`5713c20`](https://github.com/apache/spark/commit/5713c20b543a38f0454a03c67eaa277ec519a281).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5467#discussion_r29032368
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -479,9 +508,23 @@ class Word2VecModel private[mllib] (
*/
   def findSynonyms(vector: Vector, num: Int): Array[(String, Double)] = {
 require(num  0, Number of similar words should  0)
-// TODO: optimize top-k
+
 val fVector = vector.toArray.map(_.toFloat)
-model.mapValues(vec = cosineSimilarity(fVector, vec))
+val cosineVec = Array.fill[Float](numWords)(0)
+val alpha: Float = 1
+val beta: Float = 0
+
+blas.sgemv(
+  T, vectorSize, numWords, alpha, wordVectors, vectorSize, fVector, 
1, beta, cosineVec, 1)
+
+// Need not divide with the norm of the given vector since it is 
constant.
+val updatedCosines = new Array[Double](numWords)
--- End diff --

Should reuse `cosineVec`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...

2015-04-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5467#discussion_r29032366
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -479,9 +508,23 @@ class Word2VecModel private[mllib] (
*/
   def findSynonyms(vector: Vector, num: Int): Array[(String, Double)] = {
 require(num  0, Number of similar words should  0)
-// TODO: optimize top-k
--- End diff --

This TODO was created to use `BoundedPriorityQueue` to compute top k:


https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/BoundedPriorityQueue.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7118] [Python] Add the coalesce Spark S...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5683#issuecomment-95857912
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...

2015-04-24 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5676#issuecomment-95900233
  
@ArcherShao yes please, the JIRA was already marked as a duplicate. 
https://issues.apache.org/jira/browse/SPARK-6891


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5726] [MLLIB] Elementwise (Hadamard) Ve...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4580#issuecomment-95841444
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30923/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5726] [MLLIB] Elementwise (Hadamard) Ve...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4580#issuecomment-95841433
  
  [Test build #30923 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30923/consoleFull)
 for   PR 4580 at commit 
[`e7ff5f2`](https://github.com/apache/spark/commit/e7ff5f2cc3c172b97c6ea3cec6ebf7546682a74b).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ElementwiseProduct extends UnaryTransformer[Vector, Vector, 
ElementwiseProduct] `
  * `class ElementwiseProduct(val scalingVector: Vector) extends 
VectorTransformer `

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-04-24 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/4723#discussion_r29031516
  
--- Diff: python/pyspark/streaming/kafka.py ---
@@ -70,7 +71,195 @@ def createStream(ssc, zkQuorum, groupId, topics, 
kafkaParams={},
 except Py4JJavaError, e:
 # TODO: use --jar once it also work on driver
 if 'ClassNotFoundException' in str(e.java_exception):
-print 
+KafkaUtils._printErrorMsg(ssc.sparkContext)
+raise e
+ser = PairDeserializer(NoOpSerializer(), NoOpSerializer())
+stream = DStream(jstream, ssc, ser)
+return stream.map(lambda (k, v): (keyDecoder(k), valueDecoder(v)))
+
+@staticmethod
+def createDirectStream(ssc, topics, kafkaParams,
+   keyDecoder=utf8_decoder, 
valueDecoder=utf8_decoder):
+
+.. note:: Experimental
+
+Create an input stream that directly pulls messages from a Kafka 
Broker.
+
+This is not a receiver based Kafka input stream, it directly pulls 
the message from Kafka
+in each batch duration and processed without storing.
+
+This does not use Zookeeper to store offsets. The consumed offsets 
are tracked
+by the stream itself. For interoperability with Kafka monitoring 
tools that depend on
+Zookeeper, you have to update Kafka/Zookeeper yourself from the 
streaming application.
+You can access the offsets used in each batch from the generated 
RDDs (see
+
+To recover from driver failures, you have to enable checkpointing 
in the StreamingContext.
+The information on consumed offset can be recovered from the 
checkpoint.
+See the programming guide for details (constraints, etc.).
+
+:param ssc:  StreamingContext object
+:param topics:  list of topic_name to consume.
+:param kafkaParams: Additional params for Kafka
+:param keyDecoder:  A function used to decode key (default is 
utf8_decoder)
+:param valueDecoder:  A function used to decode value (default is 
utf8_decoder)
+:return: A DStream object
+
+if not isinstance(topics, list):
+raise TypeError(topics should be list)
+if not isinstance(kafkaParams, dict):
+raise TypeError(kafkaParams should be dict)
+
+jtopics = SetConverter().convert(topics, 
ssc.sparkContext._gateway._gateway_client)
+jparam = MapConverter().convert(kafkaParams, 
ssc.sparkContext._gateway._gateway_client)
+
+try:
+helperClass = 
ssc._jvm.java.lang.Thread.currentThread().getContextClassLoader() \
+
.loadClass(org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper)
+helper = helperClass.newInstance()
+jstream = helper.createDirectStream(ssc._jssc, jparam, jtopics)
+except Py4JJavaError, e:
+if 'ClassNotFoundException' in str(e.java_exception):
+KafkaUtils._printErrorMsg(ssc.sparkContext)
+raise e
+
+ser = PairDeserializer(NoOpSerializer(), NoOpSerializer())
+stream = DStream(jstream, ssc, ser)
+return stream.map(lambda (k, v): (keyDecoder(k), valueDecoder(v)))
+
+@staticmethod
+def createDirectStreamFromOffset(ssc, kafkaParams, fromOffsets,
--- End diff --

Since python do not support method overload, so I use different method name 
to differentiate it. I will change to way you mentioned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-04-24 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-95857854
  
I just file a jira issue, https://issues.apache.org/jira/browse/SPARK-7119. 
@viirya can you help on investigate this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5604#issuecomment-95868952
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30928/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7115]][MLLIB] skip the very first 1 in ...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5681#issuecomment-95868811
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30922/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5604#issuecomment-95868922
  
  [Test build #30928 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30928/consoleFull)
 for   PR 5604 at commit 
[`d07101b`](https://github.com/apache/spark/commit/d07101bd5a6f3b30532c4d4d77ab8d310607b684).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class WindowExpression(child: Expression, windowSpec: WindowSpec) 
extends UnaryExpression `
  * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: 
Option[WindowFrame])`
  * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: 
Seq[SortOrder])`
  * `case class WindowFrame(frameType: FrameType, preceding: Int, 
following: Int)`
  * `case class WindowAggregate(`
  * `case class WindowAggregate(`
  * `  case class ComputedWindow(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6304][Streaming] Fix checkpointing does...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5060#issuecomment-95878205
  
  [Test build #30926 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30926/consoleFull)
 for   PR 5060 at commit 
[`5713c20`](https://github.com/apache/spark/commit/5713c20b543a38f0454a03c67eaa277ec519a281).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7098][SQL] Make the WHERE clause with t...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5682#issuecomment-95878093
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30925/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

2015-04-24 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/5604#issuecomment-95878550
  
@guowei2 , can you generate golden answer for this locally?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6304][Streaming] Fix checkpointing does...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5060#issuecomment-95878250
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30926/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7098][SQL] Make the WHERE clause with t...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5682#issuecomment-95878073
  
  [Test build #30925 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30925/consoleFull)
 for   PR 5682 at commit 
[`4e98520`](https://github.com/apache/spark/commit/4e98520e78832b25877d825392d66d10779281f7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-95883549
  
  [Test build #30930 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30930/consoleFull)
 for   PR 5685 at commit 
[`2390a60`](https://github.com/apache/spark/commit/2390a608ed74a9703d3763d040421dccb51242ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-95905246
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30930/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [PySpark][Minor] Update sql example, so that c...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5684#issuecomment-95905079
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30929/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [PySpark][Minor] Update sql example, so that c...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5684#issuecomment-95905052
  
  [Test build #30929 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30929/consoleFull)
 for   PR 5684 at commit 
[`19fe145`](https://github.com/apache/spark/commit/19fe145e7a00574080b91d311376b6d2cdb4254e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-95929817
  
  [Test build #30936 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30936/consoleFull)
 for   PR 2342 at commit 
[`ef34a5b`](https://github.com/apache/spark/commit/ef34a5b87f03e3c7f623ed2c4ab53c933bf64fa8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5688#issuecomment-95930701
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30934/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] Added function to get predict value an...

2015-04-24 Thread oscaroboto
Github user oscaroboto closed the pull request at:

https://github.com/apache/spark/pull/5689


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] Added function to get predict value an...

2015-04-24 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5689#issuecomment-95934405
  
@oscaroboto please have a look at 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark first. 
This isn't connected to a JIRA. 

In fact there are JIRAs on this subject already. Have a look and consider 
connecting to existing proposals to expose a probability distribution; you 
might solve several at once:


https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20text%20~%20probabilityquickSearchQuery=spark%20probability


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5608#issuecomment-95937721
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30931/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5608#issuecomment-95937710
  
**[Test build #30931 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30931/consoleFull)**
 for PR 5608 at commit 
[`a9fca84`](https://github.com/apache/spark/commit/a9fca8444d7a8591032383a7d6ced84ee1f66a56)
 after a configured wait of `150m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5688#issuecomment-95943436
  
  [Test build #30938 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30938/consoleFull)
 for   PR 5688 at commit 
[`a69b1d9`](https://github.com/apache/spark/commit/a69b1d9f0cbbbca44b48107763efed11d31019f6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5688#issuecomment-95930691
  
  [Test build #30934 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30934/consoleFull)
 for   PR 5688 at commit 
[`7b1a00a`](https://github.com/apache/spark/commit/7b1a00a1dc281870e8779b5153fa1fd1bc797aeb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2750][WIP]Add Https support for Web UI

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5664#issuecomment-95939658
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30935/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2750][WIP]Add Https support for Web UI

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5664#issuecomment-95939591
  
  [Test build #30935 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30935/consoleFull)
 for   PR 5664 at commit 
[`5efac85`](https://github.com/apache/spark/commit/5efac8536d86aea631b25830194e00fb83c3b447).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7093][SQL] Using newPredicate in Nested...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5665#issuecomment-95942487
  
  [Test build #30933 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30933/consoleFull)
 for   PR 5665 at commit 
[`d19dd31`](https://github.com/apache/spark/commit/d19dd312a18af43131005d1bf6d2944b259c0721).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7123] [SQL] fixed table.star in sqlcont...

2015-04-24 Thread scwf
GitHub user scwf opened a pull request:

https://github.com/apache/spark/pull/5690

[SPARK-7123] [SQL] fixed table.star in sqlcontext

Run following sql get error
`SELECT r.*
FROM testData l join testData2 r on (l.key = r.a)` 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scwf/spark tablestar

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5690.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5690


commit 3b2e2b6a2e1b3f56c8944f62b2f184bebf7bac24
Author: scwf wangf...@huawei.com
Date:   2015-04-24T13:50:32Z

support table.star




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7093][SQL] Using newPredicate in Nested...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5665#issuecomment-95942529
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30933/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7093][SQL] Using newPredicate in Nested...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5665#issuecomment-95915693
  
  [Test build #30933 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30933/consoleFull)
 for   PR 5665 at commit 
[`d19dd31`](https://github.com/apache/spark/commit/d19dd312a18af43131005d1bf6d2944b259c0721).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5688#issuecomment-95917038
  
  [Test build #30934 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30934/consoleFull)
 for   PR 5688 at commit 
[`7b1a00a`](https://github.com/apache/spark/commit/7b1a00a1dc281870e8779b5153fa1fd1bc797aeb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2750][WIP]Add Https support for Web UI

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5664#issuecomment-95920724
  
  [Test build #30935 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30935/consoleFull)
 for   PR 5664 at commit 
[`5efac85`](https://github.com/apache/spark/commit/5efac8536d86aea631b25830194e00fb83c3b447).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7123] [SQL] fixed table.star in sqlcont...

2015-04-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5690#issuecomment-95943453
  
  [Test build #30937 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30937/consoleFull)
 for   PR 5690 at commit 
[`3b2e2b6`](https://github.com/apache/spark/commit/3b2e2b6a2e1b3f56c8944f62b2f184bebf7bac24).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6122][Core] Upgrade tachyon-client vers...

2015-04-24 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5354#issuecomment-95924132
  
LGTM. Thank you for your perseverance. This gets the change in with minimal 
additional change to the build, keeps everything compiling and actually 
improves the management of one dependency along the way.

I think the large list of removed dependencies above is a false positive. 
It can't remove these.

Let me merge and let's double check that the other Jenkins builds are still 
happy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-24 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5608#discussion_r29046278
  
--- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
@@ -204,25 +204,36 @@ private[spark] object SizeEstimator extends Logging {
 }
   } else {
 // Estimate the size of a large array by sampling elements without 
replacement.
-var size = 0.0
+// To exclude the shared objects that the array elements may link, 
sample twice
+// and use the min one to caculate array size.
 val rand = new Random(42)
-val drawn = new OpenHashSet[Int](ARRAY_SAMPLE_SIZE)
-var numElementsDrawn = 0
-while (numElementsDrawn  ARRAY_SAMPLE_SIZE) {
-  var index = 0
-  do {
-index = rand.nextInt(length)
-  } while (drawn.contains(index))
-  drawn.add(index)
-  val elem = ScalaRunTime.array_apply(array, 
index).asInstanceOf[AnyRef]
-  size += SizeEstimator.estimate(elem, state.visited)
-  numElementsDrawn += 1
-}
-state.size += ((length / (ARRAY_SAMPLE_SIZE * 1.0)) * size).toLong
+val drawn = new OpenHashSet[Int](2 * ARRAY_SAMPLE_SIZE)
--- End diff --

Yes that looks better. We could even generalize to sampling n times easily 
but that could be overkill. I think we have a potential problem here, that we 
sample if the array size is = 400, but then want at least 400 distinct 
elements from the array, twice. This will enter an infinite loop if the array 
has between 400 and 800 elements, and will be very slow if it's just a bit 
larger than 800.

You could sample with replacement, or, only draw `ARRAY_SAMPLE_SIZE/2` 
elements ( `ARRAY_SAMPLE_SIZE/n` in general. For simplicity, and to avoid 
slow-downs, I'd say sample with replacement.

You can put the sample threshold back to 200 then, too. I don't know if 
that needs to change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added function to get predict value and probab...

2015-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5689#issuecomment-95931900
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >