date:20140810

[GitHub] spark pull request: [SPARK-2460] Optimize SparkContext.hadoopFile ...

2014-08-10 Thread scwf

Github user scwf closed the pull request at:

https://github.com/apache/spark/pull/1385


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1871#issuecomment-51708105
  
QA results for PR 1871:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18272/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2953] Allow using short names for io co...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1873#issuecomment-51708328
  
QA results for PR 1873:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18273/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [PySpark] [SPARK-2954] [SPARK-2948] [SPARK-291...

2014-08-10 Thread JoshRosen

GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/1874

[PySpark] [SPARK-2954] [SPARK-2948] [SPARK-2910] [SPARK-2101] Python 2.6 
Fixes

- Modify dev/run-tests to test with Python 2.6
- Use unittest2 when running on Python 2.6.
- Fix issue with namedtuple.
- Skip TestOutputFormat.test_newhadoop on Python 2.6 until SPARK-2951 is 
fixed.
- Fix MLlib _deserialize_double on Python 2.6.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark python2.6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1874.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1874


commit 48e825b57ba603f3a655e9a611d538e6a1783f75
Author: Josh Rosen 
Date:   2014-08-09T22:26:00Z

[SPARK-2948] [SPARK-2910] [SPARK-2101] Python 2.6 fixes

- Modify dev/run-tests to test with Python 2.6
- Use unittest2 when running on Python 2.6.
- Fix issue with namedtuple.
- Skip TestOutputFormat.test_newhadoop on Python 2.6 until SPARK-2951 is 
fixed.

Closes #1868.  Closes #1042.

commit 69a113f14c0dc25aa6ff04d476bcfa57fb5f25b7
Author: Josh Rosen 
Date:   2014-08-10T07:12:52Z

[SPARK-2954] Fix MLlib _deserialize_double on Python 2.6.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [PySpark] [SPARK-2954] [SPARK-2948] [SPARK-291...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1874#issuecomment-51708413
  
QA tests have started for PR 1874. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18274/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [PySpark] [SPARK-2954] [SPARK-2948] [SPARK-291...

2014-08-10 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/1874#issuecomment-51708455
  
Jenkins, test this please.

I installed `unittest2` on Jenkins, so hopefully these tests should now 
pass with `python2.6`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [PySpark] [SPARK-2954] [SPARK-2948] [SPARK-291...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1874#issuecomment-51708493
  
QA tests have started for PR 1874. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18275/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2952] Enable logging actor messages at ...

2014-08-10 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1870#issuecomment-51708496
  
Yeah that and the `DriverSuite`. Not sure what the reason is yet, but I 
noticed that it started happening after #1777 went in...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Support executing Spark from symlinks

2014-08-10 Thread roji

GitHub user roji opened a pull request:

https://github.com/apache/spark/pull/1875

Support executing Spark from symlinks

The current scripts (e.g. pyspark) fail to run when they are executed via 
symlinks. A common Linux scenario would be to have Spark installed somewhere 
(e.g. /opt) and have a symlink to it in /usr/bin.

Fixed the scripts to traverse symlinks until reaching the actual binary.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/roji/spark handle_symlinks

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1875.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1875


commit a63c98fe4b4a455ebda4821156c6d8df1550d24f
Author: Shay Rojansky 
Date:   2014-08-10T08:04:45Z

Support executing Spark from symlinks

The current scripts (e.g. pyspark) fail to run when they are
executed via symlinks. A common Linux scenario would be to have Spark
installed somewhere (e.g. /opt) and have a symlink to it in /usr/bin.

Fixed the scripts to traverse symlinks until reaching the actual binary.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Support executing Spark from symlinks

2014-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1875#issuecomment-51708965
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2677] BasicBlockFetchIterator#next can ...

2014-08-10 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/1632#issuecomment-51709077
  
In #1758 @JoshRosen fixed ConnectionManager to handle the case remote 
executor return error message.
But, the case remote executor hangs up is not handled so if remote executor 
cannot return any message, fetching executor still waits forever.

The latest PR  fixes this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [PySpark] [SPARK-2954] [SPARK-2948] [SPARK-291...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1874#issuecomment-51709195
  
QA results for PR 1874:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18275/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Remove extra semicolon in Task.scala

2014-08-10 Thread witgo

GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/1876

Remove extra semicolon in Task.scala



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark remove_semicolon_in_Task_scala

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1876.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1876


commit c6ea7328bff3c31e3c08d8749ed5e966d4d02646
Author: GuoQiang Li 
Date:   2014-08-10T09:32:43Z

Remove extra semicolon in Task.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Remove extra semicolon in Task.scala

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1876#issuecomment-51710376
  
QA tests have started for PR 1876. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18276/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2950] Add gc time and shuffle write tim...

2014-08-10 Thread kayousterhout

Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/1869#issuecomment-51710728
  
Looks great!!  +1 on this being useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Turn UpdateBlockInfo into case class.

2014-08-10 Thread mridulm

Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1872#issuecomment-51710872
  
If case class then does it still need to be Externalizable ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Remove extra semicolon in Task.scala

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1876#issuecomment-51711239
  
QA results for PR 1876:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18276/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2590][SQL] Added option to handle incre...

2014-08-10 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/1853#discussion_r16030631
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -30,6 +30,7 @@ private[spark] object SQLConf {
   val SHUFFLE_PARTITIONS = "spark.sql.shuffle.partitions"
   val CODEGEN_ENABLED = "spark.sql.codegen"
   val DIALECT = "spark.sql.dialect"
+  val INCREMENTAL_COLLECT_ENABLED = "spark.sql.incrementalCollect"
--- End diff --

OK. I put it in `SQLConf` because of exactly the same reason [Zongheng 
mentioned](https://github.com/apache/spark/pull/1819/files#r15913270), and also 
had mixed feelings about this. I'd prefer to have something like Hive 
`ConfVars` to define all configurations, but let's do it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

2014-08-10 Thread mubarak

Github user mubarak commented on the pull request:

https://github.com/apache/spark/pull/1723#issuecomment-51712229
  
@tdas 
I have removed 'name' from DStream and addressed your review comments. Can 
you please review? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1871#issuecomment-51712712
  
QA tests have started for PR 1871. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18277/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1871#issuecomment-51712724
  
QA results for PR 1871:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18277/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2590][SQL] Added option to handle incre...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1853#issuecomment-51712859
  
QA tests have started for PR 1853. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18278/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1871#issuecomment-51713377
  
QA tests have started for PR 1871. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18279/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2929][SQL] Refactored Thrift server and...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1856#issuecomment-51714336
  
QA tests have started for PR 1856. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18280/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2929][SQL] Refactored Thrift server and...

2014-08-10 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/1856#issuecomment-51714349
  
The reason of the timeout occurred in build failure is unknown due to 
lacking of necessary logs (maybe something's wrong in the test suites, or maybe 
it's just running too slow). Logged all external process output by default to 
help diagnosing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1871#issuecomment-51714618
  
QA results for PR 1871:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18279/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2590][SQL] Added option to handle incre...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1853#issuecomment-51714612
  
QA results for PR 1853:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18278/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2929][SQL] Refactored Thrift server and...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1856#issuecomment-51715928
  
QA results for PR 1856:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds the following public classes 
(experimental):class CliSuite extends FunSuite with BeforeAndAfterAll with 
Logging {class HiveThriftServer2Suite extends FunSuite with Logging 
{For more information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18280/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2947] DAGScheduler resubmit the st...

2014-08-10 Thread witgo

GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/1877

[WIP][SPARK-2947] DAGScheduler resubmit the stage into an infinite loop



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-2947

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1877.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1877


commit 71de1c0e67e6b0731913d28906b96be18d0a4a05
Author: GuoQiang Li 
Date:   2014-08-10T15:03:23Z

DAGScheduler resubmit the stage into an infinite loop




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2947] DAGScheduler resubmit the st...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1877#issuecomment-51717429
  
QA tests have started for PR 1877. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18281/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: replace println to log4j

2014-08-10 Thread critikaled

Github user critikaled commented on the pull request:

https://github.com/apache/spark/pull/1372#issuecomment-51718206
  
hey this change has not been included in 1.0.2 release. any heads up on the 
version in which this will be  reflected ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

2014-08-10 Thread roji

Github user roji commented on the pull request:

https://github.com/apache/spark/pull/1465#issuecomment-51718343
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2947] DAGScheduler resubmit the st...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1877#issuecomment-51718834
  
QA results for PR 1877:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18281/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2929][SQL] Refactored Thrift server and...

2014-08-10 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/1856#issuecomment-51719146
  
Jenkins complains

```
14/08/10 06:10:28 ERROR ClientBase: Can't get Master Kerberos principal for 
use as renewer
```

Not sure why Kerberos is involved here...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

2014-08-10 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/1465#issuecomment-51719921
  
I think there's already a mechanism to set this by using `local[N, 
maxFailures]` to create your SparkContext:

```scala
// Regular expression for local[N, maxRetries], used in tests with failing 
tasks
 val LOCAL_N_FAILURES_REGEX = """local\[([0-9]+)\s*,\s*([0-9]+)\]""".r

// ...

case LOCAL_N_FAILURES_REGEX(threads, maxFailures) =>
 val scheduler = new TaskSchedulerImpl(sc, maxFailures.toInt, 
isLocal = true)
 val backend = new LocalBackend(scheduler, threads.toInt)
 scheduler.initialize(backend)
 scheduler
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2937] Separate out samplyByKeyExact as ...

2014-08-10 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1866#discussion_r16032168
  
--- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java ---
@@ -1239,12 +1239,28 @@ public void sampleByKey() {
 Assert.assertTrue(worCounts.size() == 2);
 Assert.assertTrue(worCounts.get(0) > 0);
 Assert.assertTrue(worCounts.get(1) > 0);
-JavaPairRDD wrExact = rdd2.sampleByKey(true, 
fractions, true, 1L);
+  }
+
+  @Test
+  @SuppressWarnings("unchecked")
+  public void sampleByKeyExact() {
+JavaRDD rdd1 = sc.parallelize(Arrays.asList(1, 2, 3, 4, 5, 6, 
7, 8), 3);
+JavaPairRDD rdd2 = rdd1.mapToPair(
+new PairFunction() {
--- End diff --

Using two-space indentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2937] Separate out samplyByKeyExact as ...

2014-08-10 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1866#discussion_r16032164
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala 
---
@@ -133,68 +133,64 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
* Return a subset of this RDD sampled by key (via stratified sampling).
*
* Create a sample of this RDD using variable sampling rates for 
different keys as specified by
-   * `fractions`, a key to sampling rate map.
-   *
-   * If `exact` is set to false, create the sample via simple random 
sampling, with one pass
-   * over the RDD, to produce a sample of size that's approximately equal 
to the sum of
-   * math.ceil(numItems * samplingRate) over all key values; otherwise, 
use additional passes over
-   * the RDD to create a sample size that's exactly equal to the sum of
+   * `fractions`, a key to sampling rate map, via simple random sampling 
with one pass over the
+   * RDD, to produce a sample of size that's approximately equal to the 
sum of
* math.ceil(numItems * samplingRate) over all key values.
*/
   def sampleByKey(withReplacement: Boolean,
   fractions: JMap[K, Double],
-  exact: Boolean,
   seed: Long): JavaPairRDD[K, V] =
-new JavaPairRDD[K, V](rdd.sampleByKey(withReplacement, fractions, 
exact, seed))
+new JavaPairRDD[K, V](rdd.sampleByKey(withReplacement, fractions, 
seed))
 
   /**
* Return a subset of this RDD sampled by key (via stratified sampling).
*
* Create a sample of this RDD using variable sampling rates for 
different keys as specified by
-   * `fractions`, a key to sampling rate map.
-   *
-   * If `exact` is set to false, create the sample via simple random 
sampling, with one pass
-   * over the RDD, to produce a sample of size that's approximately equal 
to the sum of
-   * math.ceil(numItems * samplingRate) over all key values; otherwise, 
use additional passes over
-   * the RDD to create a sample size that's exactly equal to the sum of
+   * `fractions`, a key to sampling rate map, via simple random sampling 
with one pass over the
+   * RDD, to produce a sample of size that's approximately equal to the 
sum of
* math.ceil(numItems * samplingRate) over all key values.
*
-   * Use Utils.random.nextLong as the default seed for the random number 
generator
+   * Use Utils.random.nextLong as the default seed for the random number 
generator.
*/
   def sampleByKey(withReplacement: Boolean,
-  fractions: JMap[K, Double],
-  exact: Boolean): JavaPairRDD[K, V] =
-sampleByKey(withReplacement, fractions, exact, Utils.random.nextLong)
+  fractions: JMap[K, Double]): JavaPairRDD[K, V] =
+sampleByKey(withReplacement, fractions, Utils.random.nextLong)
 
   /**
-   * Return a subset of this RDD sampled by key (via stratified sampling).
+   * ::Experimental::
*
--- End diff --

Please remove this line so both `:: Experimental ::` and the first sentence 
show up in the summary of the generated doc. Otherwise, only `:: Experimental 
::` appears in the summary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2937] Separate out samplyByKeyExact as ...

2014-08-10 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1866#discussion_r16032165
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala 
---
@@ -133,68 +133,64 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
* Return a subset of this RDD sampled by key (via stratified sampling).
*
* Create a sample of this RDD using variable sampling rates for 
different keys as specified by
-   * `fractions`, a key to sampling rate map.
-   *
-   * If `exact` is set to false, create the sample via simple random 
sampling, with one pass
-   * over the RDD, to produce a sample of size that's approximately equal 
to the sum of
-   * math.ceil(numItems * samplingRate) over all key values; otherwise, 
use additional passes over
-   * the RDD to create a sample size that's exactly equal to the sum of
+   * `fractions`, a key to sampling rate map, via simple random sampling 
with one pass over the
+   * RDD, to produce a sample of size that's approximately equal to the 
sum of
* math.ceil(numItems * samplingRate) over all key values.
*/
   def sampleByKey(withReplacement: Boolean,
   fractions: JMap[K, Double],
-  exact: Boolean,
   seed: Long): JavaPairRDD[K, V] =
-new JavaPairRDD[K, V](rdd.sampleByKey(withReplacement, fractions, 
exact, seed))
+new JavaPairRDD[K, V](rdd.sampleByKey(withReplacement, fractions, 
seed))
 
   /**
* Return a subset of this RDD sampled by key (via stratified sampling).
*
* Create a sample of this RDD using variable sampling rates for 
different keys as specified by
-   * `fractions`, a key to sampling rate map.
-   *
-   * If `exact` is set to false, create the sample via simple random 
sampling, with one pass
-   * over the RDD, to produce a sample of size that's approximately equal 
to the sum of
-   * math.ceil(numItems * samplingRate) over all key values; otherwise, 
use additional passes over
-   * the RDD to create a sample size that's exactly equal to the sum of
+   * `fractions`, a key to sampling rate map, via simple random sampling 
with one pass over the
+   * RDD, to produce a sample of size that's approximately equal to the 
sum of
* math.ceil(numItems * samplingRate) over all key values.
*
-   * Use Utils.random.nextLong as the default seed for the random number 
generator
+   * Use Utils.random.nextLong as the default seed for the random number 
generator.
*/
   def sampleByKey(withReplacement: Boolean,
-  fractions: JMap[K, Double],
-  exact: Boolean): JavaPairRDD[K, V] =
-sampleByKey(withReplacement, fractions, exact, Utils.random.nextLong)
+  fractions: JMap[K, Double]): JavaPairRDD[K, V] =
+sampleByKey(withReplacement, fractions, Utils.random.nextLong)
 
   /**
-   * Return a subset of this RDD sampled by key (via stratified sampling).
+   * ::Experimental::
*
-   * Create a sample of this RDD using variable sampling rates for 
different keys as specified by
-   * `fractions`, a key to sampling rate map.
+   * Return a subset of this RDD sampled by key (via stratified sampling) 
containing exactly
+   * math.ceil(numItems * samplingRate) for each stratum (group of pairs 
with the same key).
*
-   * Produce a sample of size that's approximately equal to the sum of
-   * math.ceil(numItems * samplingRate) over all key values with one pass 
over the RDD via
-   * simple random sampling.
+   * This method differs from [[sampleByKey]] in that we make additional 
passes over the RDD to
+   * create a sample size that's exactly equal to the sum of 
math.ceil(numItems * samplingRate)
+   * over all key values with a 99.99% confidence. When sampling without 
replacement, we need one
+   * additional pass over the RDD to guarantee sample size; when sampling 
with replacement, we need
+   * two additional passes.
*/
-  def sampleByKey(withReplacement: Boolean,
+  @Experimental
+  def sampleByKeyExact(withReplacement: Boolean,
   fractions: JMap[K, Double],
   seed: Long): JavaPairRDD[K, V] =
-sampleByKey(withReplacement, fractions, false, seed)
+new JavaPairRDD[K, V](rdd.sampleByKeyExact(withReplacement, fractions, 
seed))
 
   /**
-   * Return a subset of this RDD sampled by key (via stratified sampling).
+   * ::Experimental::
*
--- End diff --

ditto: remove this line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

--

[GitHub] spark pull request: [SPARK-2937] Separate out samplyByKeyExact as ...

2014-08-10 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1866#discussion_r16032166
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -197,33 +197,57 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
* Return a subset of this RDD sampled by key (via stratified sampling).
*
* Create a sample of this RDD using variable sampling rates for 
different keys as specified by
-   * `fractions`, a key to sampling rate map.
-   *
-   * If `exact` is set to false, create the sample via simple random 
sampling, with one pass
-   * over the RDD, to produce a sample of size that's approximately equal 
to the sum of
-   * math.ceil(numItems * samplingRate) over all key values; otherwise, use
-   * additional passes over the RDD to create a sample size that's exactly 
equal to the sum of
-   * math.ceil(numItems * samplingRate) over all key values with a 99.99% 
confidence. When sampling
-   * without replacement, we need one additional pass over the RDD to 
guarantee sample size;
-   * when sampling with replacement, we need two additional passes.
+   * `fractions`, a key to sampling rate map, via simple random sampling 
with one pass over the
+   * RDD, to produce a sample of size that's approximately equal to the 
sum of
+   * math.ceil(numItems * samplingRate) over all key values.
*
* @param withReplacement whether to sample with or without replacement
* @param fractions map of specific keys to sampling rates
* @param seed seed for the random number generator
-   * @param exact whether sample size needs to be exactly 
math.ceil(fraction * size) per key
* @return RDD containing the sampled subset
*/
   def sampleByKey(withReplacement: Boolean,
   fractions: Map[K, Double],
-  exact: Boolean = false,
-  seed: Long = Utils.random.nextLong): RDD[(K, V)]= {
+  seed: Long = Utils.random.nextLong): RDD[(K, V)] = {
+
+require(fractions.values.forall(v => v >= 0.0), "Negative sampling 
rates.")
+
+val samplingFunc = if (withReplacement) {
+  StratifiedSamplingUtils.getPoissonSamplingFunction(self, fractions, 
false, seed)
+} else {
+  StratifiedSamplingUtils.getBernoulliSamplingFunction(self, 
fractions, false, seed)
+}
+self.mapPartitionsWithIndex(samplingFunc, preservesPartitioning = true)
+  }
+
+  /**
+   * ::Experimental::
+   *
--- End diff --

ditto: remove this line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2937] Separate out samplyByKeyExact as ...

2014-08-10 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1866#discussion_r16032170
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala ---
@@ -556,6 +519,97 @@ class PairRDDFunctionsSuite extends FunSuite with 
SharedSparkContext {
 intercept[IllegalArgumentException] {shuffled.lookup(-1)}
   }
 
+  private object StratifiedAuxiliary {
+def stratifier (fractionPositive: Double) = {
+  (x: Int) => if (x % 10 < (10 * fractionPositive).toInt) "1" else "0"
+}
+
+def checkSize(exact: Boolean,
+withReplacement: Boolean,
+expected: Long,
+actual: Long,
+p: Double): Boolean = {
+  if (exact) {
+return expected == actual
+  }
+  val stdev = if (withReplacement) math.sqrt(expected) else 
math.sqrt(expected * p * (1 - p))
+  // Very forgiving margin since we're dealing with very small sample 
sizes most of the time
+  math.abs(actual - expected) <= 6 * stdev
+}
+
+def testSampleExact(stratifiedData: RDD[(String, Int)],
+samplingRate: Double,
+seed: Long,
+n: Long) = {
+  testBernoulli(stratifiedData, true, samplingRate, seed, n)
+  testPoisson(stratifiedData, true, samplingRate, seed, n)
+}
+
+def testSample(stratifiedData: RDD[(String, Int)],
+samplingRate: Double,
+seed: Long,
+n: Long) = {
+  testBernoulli(stratifiedData, false, samplingRate, seed, n)
+  testPoisson(stratifiedData, false, samplingRate, seed, n)
+}
+
+// Without replacement validation
+def testBernoulli(stratifiedData: RDD[(String, Int)],
+exact: Boolean,
+samplingRate: Double,
+seed: Long,
+n: Long) = {
+  val expectedSampleSize = stratifiedData.countByKey()
+.mapValues(count => math.ceil(count * samplingRate).toInt)
+  val fractions = Map("1" -> samplingRate, "0" -> samplingRate)
+  val sample = if (exact) {
+stratifiedData.sampleByKeyExact(false, fractions, seed)
+  } else {
+stratifiedData.sampleByKey(false, fractions, seed)
+  }
+  val sampleCounts = sample.countByKey()
+  val takeSample = sample.collect()
+  sampleCounts.foreach { case(k, v) =>
+assert(checkSize(exact, false, expectedSampleSize(k), v, 
samplingRate)) }
+  assert(takeSample.size === takeSample.toSet.size)
+  takeSample.foreach { x => assert(1 <= x._2 && x._2 <= n, s"elements 
not in [1, $n]") }
+}
+
+// With replacement validation
+def testPoisson(stratifiedData: RDD[(String, Int)],
+exact: Boolean,
+samplingRate: Double,
+seed: Long,
+n: Long) = {
+  val expectedSampleSize = stratifiedData.countByKey().mapValues(count 
=>
+math.ceil(count * samplingRate).toInt)
+  val fractions = Map("1" -> samplingRate, "0" -> samplingRate)
+  val sample = if (exact) {
+stratifiedData.sampleByKeyExact(true, fractions, seed)
+  } else {
+stratifiedData.sampleByKey(true, fractions, seed)
+  }
+  val sampleCounts = sample.countByKey()
+  val takeSample = sample.collect()
+  sampleCounts.foreach { case (k, v) =>
+assert(checkSize(exact, true, expectedSampleSize(k), v, 
samplingRate)) }
--- End diff --

move `}` to next line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2937] Separate out samplyByKeyExact as ...

2014-08-10 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1866#discussion_r16032172
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala ---
@@ -556,6 +519,97 @@ class PairRDDFunctionsSuite extends FunSuite with 
SharedSparkContext {
 intercept[IllegalArgumentException] {shuffled.lookup(-1)}
   }
 
+  private object StratifiedAuxiliary {
+def stratifier (fractionPositive: Double) = {
+  (x: Int) => if (x % 10 < (10 * fractionPositive).toInt) "1" else "0"
+}
+
+def checkSize(exact: Boolean,
+withReplacement: Boolean,
+expected: Long,
+actual: Long,
+p: Double): Boolean = {
+  if (exact) {
+return expected == actual
+  }
+  val stdev = if (withReplacement) math.sqrt(expected) else 
math.sqrt(expected * p * (1 - p))
+  // Very forgiving margin since we're dealing with very small sample 
sizes most of the time
+  math.abs(actual - expected) <= 6 * stdev
+}
+
+def testSampleExact(stratifiedData: RDD[(String, Int)],
+samplingRate: Double,
+seed: Long,
+n: Long) = {
+  testBernoulli(stratifiedData, true, samplingRate, seed, n)
+  testPoisson(stratifiedData, true, samplingRate, seed, n)
+}
+
+def testSample(stratifiedData: RDD[(String, Int)],
+samplingRate: Double,
+seed: Long,
+n: Long) = {
+  testBernoulli(stratifiedData, false, samplingRate, seed, n)
+  testPoisson(stratifiedData, false, samplingRate, seed, n)
+}
+
+// Without replacement validation
+def testBernoulli(stratifiedData: RDD[(String, Int)],
+exact: Boolean,
+samplingRate: Double,
+seed: Long,
+n: Long) = {
+  val expectedSampleSize = stratifiedData.countByKey()
+.mapValues(count => math.ceil(count * samplingRate).toInt)
+  val fractions = Map("1" -> samplingRate, "0" -> samplingRate)
+  val sample = if (exact) {
+stratifiedData.sampleByKeyExact(false, fractions, seed)
+  } else {
+stratifiedData.sampleByKey(false, fractions, seed)
+  }
+  val sampleCounts = sample.countByKey()
+  val takeSample = sample.collect()
+  sampleCounts.foreach { case(k, v) =>
+assert(checkSize(exact, false, expectedSampleSize(k), v, 
samplingRate)) }
+  assert(takeSample.size === takeSample.toSet.size)
+  takeSample.foreach { x => assert(1 <= x._2 && x._2 <= n, s"elements 
not in [1, $n]") }
+}
+
+// With replacement validation
+def testPoisson(stratifiedData: RDD[(String, Int)],
+exact: Boolean,
+samplingRate: Double,
+seed: Long,
+n: Long) = {
+  val expectedSampleSize = stratifiedData.countByKey().mapValues(count 
=>
+math.ceil(count * samplingRate).toInt)
+  val fractions = Map("1" -> samplingRate, "0" -> samplingRate)
+  val sample = if (exact) {
+stratifiedData.sampleByKeyExact(true, fractions, seed)
+  } else {
+stratifiedData.sampleByKey(true, fractions, seed)
+  }
+  val sampleCounts = sample.countByKey()
+  val takeSample = sample.collect()
+  sampleCounts.foreach { case (k, v) =>
+assert(checkSize(exact, true, expectedSampleSize(k), v, 
samplingRate)) }
+  val groupedByKey = takeSample.groupBy(_._1)
+  for ((key, v) <- groupedByKey) {
+if (expectedSampleSize(key) >= 100 && samplingRate >= 0.1) {
+  // sample large enough for there to be repeats with high 
likelihood
+  assert(v.toSet.size < expectedSampleSize(key))
+} else {
+  if (exact) {
+assert(v.toSet.size <= expectedSampleSize(key))
+  } else {
+assert(checkSize(false, true, expectedSampleSize(key), 
v.toSet.size, samplingRate))
+  }
+}
+  }
+  takeSample.foreach { x => assert(1 <= x._2 && x._2 <= n, s"elements 
not in [1, $n]") }
--- End diff --

minor: `takeSample.foreach(x => assert(1 <= x._2 && x._2 <= n, s"elements 
not in [1, $n]"))` We usually use `{ ... }` for pattern matching or multi-line 
statement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comman

[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...

2014-08-10 Thread erikerlandson

Github user erikerlandson commented on the pull request:

https://github.com/apache/spark/pull/1839#issuecomment-51720727
  
Jenkins still not getting the memo.   How strict is Jenkins with commands?  
 Is 'okay' same as 'ok'?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2937] Separate out samplyByKeyExact as ...

2014-08-10 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1866#issuecomment-51720797
  
LGTM except inline comments. Thanks for keeping APIs consistent across 
languages!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

2014-08-10 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1862#issuecomment-51720829
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1871#issuecomment-51720842
  
@Ishiihara Did you compare the speed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1862#issuecomment-51720881
  
QA tests have started for PR 1862. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18282/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread Ishiihara

Github user Ishiihara commented on the pull request:

https://github.com/apache/spark/pull/1871#issuecomment-51720995
  
@mengxr It is about 1-2  minutes slower with vector size = 100 for 
different number of partitions.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1862#issuecomment-51722306
  
QA results for PR 1862:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18282/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2923][MLLIB] Implement some basic BLAS ...

2014-08-10 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/1849#issuecomment-51722636
  
@mengxr  By the way, I actually realized that copying sparse to dense 
vectors would be useful for me (in an example I wrote for the stats API check). 
 I wanted it for a replacement for indexing into sparse vectors (since that 
does not exist yet).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2850] [mllib] MLlib stats examples + sm...

2014-08-10 Thread jkbradley

GitHub user jkbradley opened a pull request:

https://github.com/apache/spark/pull/1878

[SPARK-2850] [mllib] MLlib stats examples + small fixes

Added examples for statistical summarization:
* Scala: StatisticalSummary.scala
** Tests: correlation, MultivariateOnlineSummarizer
* python: statistical_summary.py
** Tests: correlation (since MultivariateOnlineSummarizer has no Python API)

Added examples for random and sampled RDDs:
* Scala: RandomAndSampledRDDs.scala
* python: random_and_sampled_rdds.py
* Both test:
** RandomRDDGenerators.normalRDD, normalVectorRDD
** RDD.sample, takeSample, sampleByKey

Added sc.stop() to all examples.

CorrelationSuite.scala
* Added 1 test for RDDs with only 1 value

RowMatrix.scala
* numCols(): Added check for numRows = 0, with error message.
* computeCovariance(): Added check for numRows <= 1, with error message.

Python SparseVector (pyspark/mllib/linalg.py)
* Added toDense() function

python/run-tests script
* Added stat.py (doc test)

CC: @mengxr @dorx  Main changes were examples to show usage across APIs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkbradley/spark mllib-stats-api-check

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1878.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1878


commit ee918e9e165a02dc55235877484502baaaf906e0
Author: Joseph K. Bradley 
Date:   2014-08-07T21:34:11Z

Added examples for statistical summarization:
* Scala: StatisticalSummary.scala
** Tests: correlation, MultivariateOnlineSummarizer
* python: statistical_summary.py
** Tests: correlation (since MultivariateOnlineSummarizer has no Python API)

Added sc.stop() to all examples.

CorrelationSuite.scala
* Added 1 test for RDDs with only 1 value

Python SparseVector (pyspark/mllib/linalg.py)
* Added toDense() function

python/run-tests script
* Added stat.py (doc test)

commit 064985bd59b854bbca70290256348177415b5bda
Author: Joseph K. Bradley 
Date:   2014-08-07T23:34:38Z

Merge remote-tracking branch 'upstream/master' into mllib-stats-api-check

commit 8195c78a312087ee18375b745600946e47fcdd46
Author: Joseph K. Bradley 
Date:   2014-08-08T01:42:52Z

Added examples for random and sampled RDDs:
* Scala: RandomAndSampledRDDs.scala
* python: random_and_sampled_rdds.py
* Both test:
** RandomRDDGenerators.normalRDD, normalVectorRDD
** RDD.sample, takeSample, sampleByKey

commit 65e4ebc8c07c7fb4bf76f80c11b28f790362533e
Author: Joseph K. Bradley 
Date:   2014-08-10T17:36:10Z

Merge remote-tracking branch 'upstream/master' into mllib-stats-api-check

commit ab48f6eb01541309ffa2d86febb0a039f435a60a
Author: Joseph K. Bradley 
Date:   2014-08-10T18:26:03Z

RowMatrix.scala
* numCols(): Added check for numRows = 0, with error message.
* computeCovariance(): Added check for numRows <= 1, with error message.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2850] [mllib] MLlib stats examples + sm...

2014-08-10 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/1878#issuecomment-51723114
  
Q: Is the Python SparseVector.toDense() function too big an API update?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2850] [mllib] MLlib stats examples + sm...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1878#issuecomment-51723167
  
QA tests have started for PR 1878. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18283/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Turn UpdateBlockInfo into case class.

2014-08-10 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1872#issuecomment-51724118
  
It is using some custom serialization to reduce serialization overhead. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Remove extra semicolon in Task.scala

2014-08-10 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1876#issuecomment-51724140
  
Thanks. I've merged this in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Remove extra semicolon in Task.scala

2014-08-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1876


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread Ishiihara

Github user Ishiihara commented on the pull request:

https://github.com/apache/spark/pull/1871#issuecomment-51724432
  
@mengxr Some benchmark result
Environment: OSX 10.9, 8G memory, 2.5G i5 CPU, 4 threads
startingAlpha = 0.0025
vecterSize = 100
Driver memory 2g

syn0 and syn1 as mutable.HashMap

| numPartition  | numIteration  | time  | total shuffle write |
|  | -  
|---|---|
|1 |1   | 9m30.828s|42.6MB |
|4 |1   | 5m47.192s|43.6MB |
|10|1   | 6m12.333s|490.4MB|
|100   |1   | 6m24.663s |2.0G|

syn0 and syn1 as big Array

| numPartition  | numIteration  | time  | total shuffle write |
|  | -  
|---|---|
|1 |1   | 9m1.675s  |42.6MB |
|4 |1   | 5m3.130s  |43.6MB |
|10|1   | 5m24.283s |580MB|
|100   |1   | 5m52.446s| 4.1G|


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2950] Add gc time and shuffle write tim...

2014-08-10 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/1869#issuecomment-51725000
  
Merged this into master. @pwendell Could we cherry pick this for 1.1 as 
well ? Its a small change and will allow profiling clusters running 1.1 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2950] Add gc time and shuffle write tim...

2014-08-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1869


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2898] [PySpark] fix bugs in deamon.py

2014-08-10 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/1842#issuecomment-51725452
  
I've merged this into `master` and `branch-1.1`.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2898] [PySpark] fix bugs in deamon.py

2014-08-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1842


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2955 [BUILD] Test code fails to compile ...

2014-08-10 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/1879

SPARK-2955 [BUILD] Test code fails to compile with "mvn compile" without 
"install"

(This is the corrected follow-up to 
https://issues.apache.org/jira/browse/SPARK-2903)

Right now, `mvn compile test-compile` fails to compile Spark. (Don't worry; 
`mvn package` works, so this is not major.) The issue stems from test code in 
some modules depending on test code in other modules. That is perfectly fine 
and supported by Maven.

It takes extra work to get this to work with scalatest, and this has been 
attempted: https://github.com/apache/spark/blob/master/sql/catalyst/pom.xml#L86

This formulation is not quite enough, since the SQL Core module's tests 
fail to compile for lack of finding test classes in SQL Catalyst, and likewise 
for most Streaming integration modules depending on core Streaming test code. 
Example:

```
[error] 
/Users/srowen/Documents/spark/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:23:
 not found: type PlanTest
[error] class QueryTest extends PlanTest {
[error] ^
[error] 
/Users/srowen/Documents/spark/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala:28:
 package org.apache.spark.sql.test is not a value
[error]   test("SPARK-1669: cacheTable should be idempotent") {
[error]   ^
...
```

The issue I believe is that generation of a `test-jar` is bound here to the 
`compile` phase, but the test classes are not being compiled in this phase. It 
should bind to the `test-compile` phase.

It works when executing `mvn package` or `mvn install` since test-jar 
artifacts are actually generated available through normal Maven mechanisms as 
each module is built. They are then found normally, regardless of scalatest 
configuration.

It would be nice for a simple `mvn compile test-compile` to work since the 
test code is perfectly compilable given the Maven declarations.

On the plus side, this change is low-risk as it only affects tests.
@yhuai made the original scalatest change and has glanced at this and 
thinks it makes sense.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-2955

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1879.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1879


commit ad8242f124ced3779055a24744163091580b001d
Author: Sean Owen 
Date:   2014-08-07T22:16:28Z

Generate test-jar on test-compile for modules whose tests are needed by 
others' tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2955 [BUILD] Test code fails to compile ...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1879#issuecomment-51725678
  
QA tests have started for PR 1879. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18284/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2936] Migrate Netty network module from...

2014-08-10 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/1865#discussion_r16033181
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/netty/FileServerHandler.scala ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.netty
+
+import java.io.FileInputStream
+
+import io.netty.channel.{DefaultFileRegion, ChannelHandlerContext, 
SimpleChannelInboundHandler}
+
+import org.apache.spark.Logging
+import org.apache.spark.storage.{BlockId, FileSegment}
+
+
+class FileServerHandler(pResolver: PathResolver)
+  extends SimpleChannelInboundHandler[String] with Logging {
+
+  override def channelRead0(ctx: ChannelHandlerContext, blockIdString: 
String): Unit = {
+val blockId: BlockId = BlockId.apply(blockIdString)
--- End diff --

BlockId(blockIdString)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2936] Migrate Netty network module from...

2014-08-10 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/1865#discussion_r16033193
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/netty/FileServer.scala ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.netty
+
+import java.net.InetSocketAddress
+
+import io.netty.bootstrap.ServerBootstrap
+import io.netty.channel.{ChannelFuture, ChannelOption, EventLoopGroup}
+import io.netty.channel.oio.OioEventLoopGroup
+import io.netty.channel.socket.oio.OioServerSocketChannel
+
+import org.apache.spark.Logging
+
+/**
+ * Server that accept the path of a file an echo back its content.
+ */
+class FileServer(pResolver: PathResolver, private var port: Int) extends 
Logging {
+
+  private val addr: InetSocketAddress = new InetSocketAddress(port)
+  private var bossGroup: EventLoopGroup = new OioEventLoopGroup
+  private var workerGroup: EventLoopGroup = new OioEventLoopGroup
+
+  private var channelFuture: ChannelFuture = {
+val bootstrap = new ServerBootstrap
+bootstrap.group(bossGroup, workerGroup)
+  .channel(classOf[OioServerSocketChannel])
+  .option(ChannelOption.SO_BACKLOG, java.lang.Integer.valueOf(100))
+  .option(ChannelOption.SO_RCVBUF, java.lang.Integer.valueOf(1500))
+  .childHandler(new FileServerChannelInitializer(pResolver))
+bootstrap.bind(addr)
+  }
+
+  try {
+val boundAddress = 
channelFuture.sync.channel.localAddress.asInstanceOf[InetSocketAddress]
+port = boundAddress.getPort
+  } catch {
+case ie: InterruptedException =>
+  port = 0
+  }
+
+  /** Start the file server asynchronously in a new thread. */
+  def start(): Unit = {
+val blockingThread: Thread = new Thread {
+  override def run(): Unit = {
+try {
+  channelFuture.channel.closeFuture.sync
+  logInfo("FileServer exiting")
+} catch {
+  case e: InterruptedException =>
+logError("File server start got interrupted", e)
+}
+  }
+}
+blockingThread.setDaemon(true)
+blockingThread.start()
+  }
+
+  def getPort: Int = port
+
+  def stop(): Unit = {
+if (channelFuture != null) {
+  channelFuture.channel().close().awaitUninterruptibly()
+  channelFuture = null
+}
+if (bossGroup != null) {
+  bossGroup.shutdownGracefully()
+  bossGroup = null
+}
+if (workerGroup != null) {
+  workerGroup.shutdownGracefully()
+  workerGroup = null
+}
--- End diff --

You got rid of the TODO - is it done?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2936] Migrate Netty network module from...

2014-08-10 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/1865#discussion_r16033197
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/netty/FileServer.scala ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.netty
+
+import java.net.InetSocketAddress
+
+import io.netty.bootstrap.ServerBootstrap
+import io.netty.channel.{ChannelFuture, ChannelOption, EventLoopGroup}
+import io.netty.channel.oio.OioEventLoopGroup
+import io.netty.channel.socket.oio.OioServerSocketChannel
+
+import org.apache.spark.Logging
+
+/**
+ * Server that accept the path of a file an echo back its content.
+ */
+class FileServer(pResolver: PathResolver, private var port: Int) extends 
Logging {
+
+  private val addr: InetSocketAddress = new InetSocketAddress(port)
+  private var bossGroup: EventLoopGroup = new OioEventLoopGroup
+  private var workerGroup: EventLoopGroup = new OioEventLoopGroup
+
+  private var channelFuture: ChannelFuture = {
+val bootstrap = new ServerBootstrap
+bootstrap.group(bossGroup, workerGroup)
+  .channel(classOf[OioServerSocketChannel])
+  .option(ChannelOption.SO_BACKLOG, java.lang.Integer.valueOf(100))
+  .option(ChannelOption.SO_RCVBUF, java.lang.Integer.valueOf(1500))
+  .childHandler(new FileServerChannelInitializer(pResolver))
+bootstrap.bind(addr)
+  }
+
+  try {
+val boundAddress = 
channelFuture.sync.channel.localAddress.asInstanceOf[InetSocketAddress]
+port = boundAddress.getPort
+  } catch {
+case ie: InterruptedException =>
+  port = 0
+  }
+
+  /** Start the file server asynchronously in a new thread. */
+  def start(): Unit = {
+val blockingThread: Thread = new Thread {
+  override def run(): Unit = {
+try {
+  channelFuture.channel.closeFuture.sync
+  logInfo("FileServer exiting")
+} catch {
+  case e: InterruptedException =>
+logError("File server start got interrupted", e)
+}
--- End diff --

You got rid of NOTE, that seemed useful


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2936] Migrate Netty network module from...

2014-08-10 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/1865#discussion_r16033207
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/netty/FileClient.scala ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.netty
+
+import java.util.concurrent.TimeUnit
+
+import io.netty.bootstrap.Bootstrap
+import io.netty.channel.{Channel, ChannelOption, EventLoopGroup}
+import io.netty.channel.oio.OioEventLoopGroup
+import io.netty.channel.socket.oio.OioSocketChannel
+
+import org.apache.spark.Logging
+
+class FileClient(handler: FileClientHandler, connectTimeout: Int) extends 
Logging {
+
+  private var channel: Channel = _
+  private var bootstrap: Bootstrap = _
+  private var group: EventLoopGroup = _
+  private val sendTimeout = 60
--- End diff --

The comment "// 1 min" wasn't useless because it's not 100% clear that this 
is in seconds otherwise (though one could guess)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2936] Migrate Netty network module from...

2014-08-10 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/1865#discussion_r16033208
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/netty/FileClient.scala ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.netty
+
+import java.util.concurrent.TimeUnit
+
+import io.netty.bootstrap.Bootstrap
+import io.netty.channel.{Channel, ChannelOption, EventLoopGroup}
+import io.netty.channel.oio.OioEventLoopGroup
+import io.netty.channel.socket.oio.OioSocketChannel
+
+import org.apache.spark.Logging
+
+class FileClient(handler: FileClientHandler, connectTimeout: Int) extends 
Logging {
+
+  private var channel: Channel = _
+  private var bootstrap: Bootstrap = _
+  private var group: EventLoopGroup = _
+  private val sendTimeout = 60
+
+  def init(): Unit = {
+group = new OioEventLoopGroup
+bootstrap = new Bootstrap
+bootstrap.group(group)
+  .channel(classOf[OioSocketChannel])
+  .option(ChannelOption.SO_KEEPALIVE, java.lang.Boolean.TRUE)
+  .option(ChannelOption.TCP_NODELAY, java.lang.Boolean.TRUE)
+  .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 
Integer.valueOf(connectTimeout))
--- End diff --

Why is this needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2936] Migrate Netty network module from...

2014-08-10 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/1865#issuecomment-51726479
  
Just had a couple minor issues with the translation, LGTM 
functionality-wise. Did not do a thorough diff check, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2936] Migrate Netty network module from...

2014-08-10 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/1865#discussion_r16033275
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/netty/FileClient.scala ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.netty
+
+import java.util.concurrent.TimeUnit
+
+import io.netty.bootstrap.Bootstrap
+import io.netty.channel.{Channel, ChannelOption, EventLoopGroup}
+import io.netty.channel.oio.OioEventLoopGroup
+import io.netty.channel.socket.oio.OioSocketChannel
+
+import org.apache.spark.Logging
+
+class FileClient(handler: FileClientHandler, connectTimeout: Int) extends 
Logging {
+
+  private var channel: Channel = _
+  private var bootstrap: Bootstrap = _
+  private var group: EventLoopGroup = _
+  private val sendTimeout = 60
+
+  def init(): Unit = {
+group = new OioEventLoopGroup
+bootstrap = new Bootstrap
+bootstrap.group(group)
+  .channel(classOf[OioSocketChannel])
+  .option(ChannelOption.SO_KEEPALIVE, java.lang.Boolean.TRUE)
+  .option(ChannelOption.TCP_NODELAY, java.lang.Boolean.TRUE)
+  .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 
Integer.valueOf(connectTimeout))
--- End diff --

I think having a connectTimeout is pretty useful ? I often notice connect 
timeouts on EC2 and this is not configurable in the default shuffle 
implementation (that is partly because we use NIO with SocketChannels etc.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2936] Migrate Netty network module from...

2014-08-10 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/1865#discussion_r16033305
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/netty/FileClient.scala ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.netty
+
+import java.util.concurrent.TimeUnit
+
+import io.netty.bootstrap.Bootstrap
+import io.netty.channel.{Channel, ChannelOption, EventLoopGroup}
+import io.netty.channel.oio.OioEventLoopGroup
+import io.netty.channel.socket.oio.OioSocketChannel
+
+import org.apache.spark.Logging
+
+class FileClient(handler: FileClientHandler, connectTimeout: Int) extends 
Logging {
+
+  private var channel: Channel = _
+  private var bootstrap: Bootstrap = _
+  private var group: EventLoopGroup = _
+  private val sendTimeout = 60
+
+  def init(): Unit = {
+group = new OioEventLoopGroup
+bootstrap = new Bootstrap
+bootstrap.group(group)
+  .channel(classOf[OioSocketChannel])
+  .option(ChannelOption.SO_KEEPALIVE, java.lang.Boolean.TRUE)
+  .option(ChannelOption.TCP_NODELAY, java.lang.Boolean.TRUE)
+  .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 
Integer.valueOf(connectTimeout))
--- End diff --

Sorry, wasn't asking about connectTimeout, I meant the Integer.valueOf(). 
I'm guessing it just doesn't compile without, which is kinda weird, you'd think 
there'd be a conversion from primitive to boxed in the Scala compiler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2936] Migrate Netty network module from...

2014-08-10 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1865#discussion_r1601
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/netty/FileClient.scala ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.netty
+
+import java.util.concurrent.TimeUnit
+
+import io.netty.bootstrap.Bootstrap
+import io.netty.channel.{Channel, ChannelOption, EventLoopGroup}
+import io.netty.channel.oio.OioEventLoopGroup
+import io.netty.channel.socket.oio.OioSocketChannel
+
+import org.apache.spark.Logging
+
+class FileClient(handler: FileClientHandler, connectTimeout: Int) extends 
Logging {
+
+  private var channel: Channel = _
+  private var bootstrap: Bootstrap = _
+  private var group: EventLoopGroup = _
+  private val sendTimeout = 60
+
+  def init(): Unit = {
+group = new OioEventLoopGroup
+bootstrap = new Bootstrap
+bootstrap.group(group)
+  .channel(classOf[OioSocketChannel])
+  .option(ChannelOption.SO_KEEPALIVE, java.lang.Boolean.TRUE)
+  .option(ChannelOption.TCP_NODELAY, java.lang.Boolean.TRUE)
+  .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 
Integer.valueOf(connectTimeout))
--- End diff --

Yea it didn't compile. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2650] Build column buffers in smaller b...

2014-08-10 Thread marmbrus

GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/1880

[SPARK-2650] Build column buffers in smaller batches



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark columnBatches

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1880.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1880


commit 23145329d6970ddbdf97c75d13e0a393df6d4747
Author: Michael Armbrust 
Date:   2014-08-10T21:21:10Z

Build column buffers in smaller batches




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2955 [BUILD] Test code fails to compile ...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1879#issuecomment-51727806
  
QA results for PR 1879:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18284/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2650] Build column buffers in smaller b...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1880#issuecomment-51727834
  
QA tests have started for PR 1880. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18285/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2650][SQL] Build column buffers in smal...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1880#issuecomment-51727923
  
QA results for PR 1880:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18285/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2650][SQL] Build column buffers in smal...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1880#issuecomment-51728104
  
QA tests have started for PR 1880. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18286/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2787: Make sort-based shuffle write file...

2014-08-10 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1799#discussion_r16033699
  
--- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
@@ -246,8 +250,13 @@ object SparkEnv extends Logging {
   "."
 }
 
-val shuffleManager = instantiateClass[ShuffleManager](
-  "spark.shuffle.manager", 
"org.apache.spark.shuffle.hash.HashShuffleManager")
+// Let the user specify short names for shuffle managers
+val shortShuffleMgrNames = Map(
+  "hash" -> "org.apache.spark.shuffle.hash.HashShuffleManager",
+  "sort" -> "org.apache.spark.shuffle.sort.SortShuffleManager")
+val shuffleMgrName = conf.get("spark.shuffle.manager", "hash")
--- End diff --

I'd rather not change the configuration under the user, that would be 
confusing if they later print it or look in the web UI. Instead, maybe add a 
SparkEnv.getShuffleManagerClass(conf: SparkConf) that can return the real class 
name.

Also I'd be fine initializing the ShuffleBlockManager after the 
ShuffleManager if that works, and using isInstanceOf. That would be the 
cleanest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [sql]use SparkSQLEnv.stop() in ShutdownHook

2014-08-10 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1852#issuecomment-51729135
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: replace println to log4j

2014-08-10 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1372#issuecomment-51728995
  
It will be in 1.1. I guess we can also backport it to branch-1.0 -- how bad 
is the issue, does it cause some problems or is it just annoying?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [sql]use SparkSQLEnv.stop() in ShutdownHook

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1852#issuecomment-51729257
  
QA tests have started for PR 1852. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18287/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2650][SQL] Build column buffers in smal...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1880#issuecomment-5173
  
QA results for PR 1880:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18286/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2650][SQL] Build column buffers in smal...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1880#issuecomment-51730127
  
QA tests have started for PR 1880. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18288/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2937] Separate out samplyByKeyExact as ...

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1866#issuecomment-51730126
  
QA tests have started for PR 1866. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18289/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2947] DAGScheduler resubmit the st...

2014-08-10 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1877#issuecomment-51730277
  
@witgo can you explain how this happens and why the fix works, and add a 
unit test for it? We can't really merge something like this without a test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Support executing Spark from symlinks

2014-08-10 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1875#issuecomment-51730307
  
Jenkins, this is ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Support executing Spark from symlinks

2014-08-10 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1875#issuecomment-51730319
  
@roji mind opening a JIRA issue for this on 
https://issues.apache.org/jira/browse/SPARK and adding it in the pull request's 
title?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2953] Allow using short names for io co...

2014-08-10 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1873#discussion_r16034012
  
--- Diff: docs/configuration.md ---
@@ -373,12 +373,12 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   spark.io.compression.codec
-  org.apache.spark.io.SnappyCompressionCodec
+  snappy
   
 The codec used to compress internal data such as RDD partitions and 
shuffle outputs.
 By default, Spark provides three codecs:  
org.apache.spark.io.LZ4CompressionCodec,
 org.apache.spark.io.LZFCompressionCodec,
-and org.apache.spark.io.SnappyCompressionCodec.
+and org.apache.spark.io.SnappyCompressionCodec. You can 
also use the short form: lz4, lzf, and 
snappy.
--- End diff --

You should probably just list the short names first, and then say "you can 
alternatively list a class name".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Support executing Spark from symlinks

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1875#issuecomment-51730388
  
QA tests have started for PR 1875. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18290/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [PySpark] [SPARK-2954] [SPARK-2948] [SPARK-291...

2014-08-10 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1874#discussion_r16034056
  
--- Diff: python/pyspark/tests.py ---
@@ -905,8 +911,9 @@ def createFileInZip(self, name, content):
 pattern = re.compile(r'^ *\|', re.MULTILINE)
 content = re.sub(pattern, '', content.strip())
 path = os.path.join(self.programDir, name + ".zip")
-with zipfile.ZipFile(path, 'w') as zip:
-zip.writestr(name, content)
+zip = zipfile.ZipFile(path, 'w')
+zip.writestr(name, content)
+zip.close()
--- End diff --

Why did you change this, does with on ZipFiles not work in 2.6?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2937] Separate out samplyByKeyExact as ...

2014-08-10 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1866#issuecomment-51730672
  
Looks good, I also prefer separating this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1871#issuecomment-51730712
  
Just FYI, mutable.HashMap can be pretty inefficient in space usage, 
compared e.g. to java.util.HashMap or to Spark's AppendOnlyMap. In this case it 
will depend on how many keys there are and how big the arrays of floats are (if 
that's the bulk of the data, it won't matter).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2907] [MLlib] Use mutable.HashMap to re...

2014-08-10 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1871#issuecomment-51730738
  
Even better might be Spark's PrimitiveKeyOpenHashMap here. Again, if there 
are lots of keys.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

2014-08-10 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1813#discussion_r16034129
  
--- Diff: core/src/main/java/com/google/common/base/Optional.java ---
@@ -0,0 +1,243 @@
+/*
+ * Copyright (C) 2011 The Guava Authors
--- End diff --

Shouldn't we still list it in our LICENSE? We've listed all source files 
that are not copyright Apache, which would include this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2816][SQL] Type-safe SQL Queries

2014-08-10 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1759#issuecomment-51730914
  
@marmbrus how do you intend this to work with things like Hive or JDBC? We 
won't know the types at compile time there, but we might still want a solution 
that checks the field names. I think we should have a design for that in mind 
or else this will be of somewhat limited use.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-10 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1791#issuecomment-51731037
  
BTW leaving TODOs in the Python code would also be okay, if you want to see 
this in the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2871] [PySpark] Add missing API

2014-08-10 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1791#issuecomment-51731023
  
I also actually prefer leaving out the non-implemented ones instead of 
putting them in with NotImplementedError. Especially when working in an IDE or 
something similar, the user might try to call one of those, and get confused 
when it crashes. Also, a lot of them are quite esoteric.

We can develop some other ways to track the missing APIs. For example, for 
these ones, you can create JIRA issues such as "implement ZipPartitions in 
Python", and we can do that for new APIs added to Scala / Java (we usually ask 
people to add them in Python now anyway). A lot of these particular ones are 
also pretty esoteric and I don't think people will miss them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

2014-08-10 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/1813#discussion_r16034200

--- Diff: core/src/main/java/com/google/common/base/Optional.java ---
@@ -0,0 +1,243 @@
+/*
+ * Copyright (C) 2011 The Guava Authors
--- End diff --

Not according to my reading of
http://www.apache.org/dev/licensing-howto.html , in the part covering
bundled AL2 dependencies, but this stuff is always less than intuitive to
me.

I suppose the reasoning is that Spark's AL2 license already exactly
describes the terms of the licensing for that file. That's not quite true
for BSD or MIT licenses.

The copyright owner doesn't matter; there are hundreds of them. It matters
just how whoever they are license their IP to be used.
On Aug 11, 2014 12:22 AM, "Matei Zaharia"  wrote:

> In core/src/main/java/com/google/common/base/Optional.java:
>
> > @@ -0,0 +1,243 @@
> > +/*
> > + * Copyright (C) 2011 The Guava Authors
>
> Shouldn't we still list it in our LICENSE? We've listed all source files
> that are not copyright Apache, which would include this one.
>
> â
> Reply to this email directly or view it on GitHub
> .
>

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2937] Separate out samplyByKeyExact as ...

2014-08-10 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1866#issuecomment-51731068
  
Merged into both master and branch-1.1. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2937] Separate out samplyByKeyExact as ...

2014-08-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1866


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [sql]use SparkSQLEnv.stop() in ShutdownHook

2014-08-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1852#issuecomment-51731096
  
QA results for PR 1852:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18287/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 168 matches

Mail list logo