[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/2843

[SPARK-3791][SQL][WIP] Provides Spark version and Hive version in 
HiveThriftServer2

This PR overrides the `GetInfo` Hive Thrift API to provide correct version 
information. Another property `spark.sql.hive.version` is added to reveal the 
underlying Hive version. These are generally useful for Spark SQL ODBC driver 
providers. The Spark version information is extracted from the jar manifest. 
Also took the chance to remove the `SET -v` hack, which was a workaround for 
Simba ODBC driver connectivity.

TODO

- [ ] Find a general way to figure out Hive (or even any dependency) 
version.

  For Maven builds, we can retrieve the version information from the 
META-INF/maven directory within the assembly jar. But this doesn't work for SBT 
builds. Some other possible approaches can be found in this [blog 
post](http://blog.soebes.de/blog/2014/01/02/version-information-into-your-appas-with-maven/).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark get-info

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2843.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2843


commit dc9438b68fca834c5ddce2918f8a1474f67d33d9
Author: Cheng Lian l...@databricks.com
Date:   2014-10-18T09:09:06Z

Overrides Hive GetInfo Thrift API and adds Hive version property

commit 9799b505e63793beced7ed79793739c011ee4547
Author: Cheng Lian l...@databricks.com
Date:   2014-10-19T05:52:26Z

Removes the Simba ODBC SET -v hack




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2843#discussion_r19058163
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala
 ---
@@ -33,8 +33,10 @@ private[hive] object SparkSQLEnv extends Logging {
 
   def init() {
 if (hiveContext == null) {
-  sparkContext = new SparkContext(new SparkConf()
-
.setAppName(sSparkSQL::${java.net.InetAddress.getLocalHost.getHostName}))
+  val sparkConf = new SparkConf()
+
.setAppName(sSparkSQL::${java.net.InetAddress.getLocalHost.getHostName})
+.set(spark.sql.hive.version, 0.12.0-protobuf-2.5)
--- End diff --

This need to be generalized.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2843#discussion_r19058178
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -306,7 +306,9 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   driver.destroy()
   results
 case _ =
-  sessionState.out.println(tokens(0) +   + cmd_1)
+  if (sessionState.out != null) {
+sessionState.out.println(tokens(0) +   + cmd_1)
+  }
--- End diff --

`SessionState` life cycle control is rather broken and error prone in 
current code base. Working on a separate PR to fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2843#issuecomment-59640634
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21881/consoleFull)
 for   PR 2843 at commit 
[`9799b50`](https://github.com/apache/spark/commit/9799b505e63793beced7ed79793739c011ee4547).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2843#issuecomment-59640641
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21881/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2843#discussion_r19058181
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 ---
@@ -37,35 +43,81 @@ import 
org.apache.spark.sql.catalyst.util.getTempFilePath
 
 /**
  * Tests for the HiveThriftServer2 using JDBC.
+ *
+ * NOTE: SPARK_PREPEND_CLASSES is explicitly disabled in this test suite. 
Assembly jar must be
+ * rebuilt after changing HiveThriftServer2 related code.
--- End diff --

This requirement should be OK for Jenkins, since Jenkins always build the 
assembly jar before executing any test suites.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2843#issuecomment-59640639
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21881/consoleFull)
 for   PR 2843 at commit 
[`9799b50`](https://github.com/apache/spark/commit/9799b505e63793beced7ed79793739c011ee4547).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2576#issuecomment-59640730
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21880/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2843#issuecomment-59641162
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/399/consoleFull)
 for   PR 2843 at commit 
[`9799b50`](https://github.com/apache/spark/commit/9799b505e63793beced7ed79793739c011ee4547).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2843#issuecomment-59641199
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/400/consoleFull)
 for   PR 2843 at commit 
[`9799b50`](https://github.com/apache/spark/commit/9799b505e63793beced7ed79793739c011ee4547).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread JoshRosen
GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/2844

[SPARK-3958] TorrentBroadcast cleanup / debugging improvements.

This PR makes several changes to TorrentBroadcast in order to make
it easier to reason about, which should help when debugging SPARK-3958.
The key changes:

- Remove all state from the global TorrentBroadcast object.  This state
  consisted mainly of configuration options, like the block size and
  compression codec, and was read by the blockify / unblockify methods.
  Unfortunately, the use of `lazy val` for `BLOCK_SIZE` meant that the block
  size was always determined by the first SparkConf that TorrentBroadast was
  initialized with; as a result, unit tests could not properly test
  TorrentBroadcast with different block sizes.

  Instead, blockifyObject and unBlockifyObject now accept compression codecs
  and blockSizes as arguments.  These arguments are supplied at the call 
sites
  inside of TorrentBroadcast instances.  Each TorrentBroadcast instance
  determines these values from SparkEnv's SparkConf.  I was careful to 
ensure
  that we do not accidentally serialize CompressionCodec or SparkConf 
objects
  as part of the TorrentBroadcast object.

- Remove special-case handling of local-mode in TorrentBroadcast.  I don't
  think that broadcast implementations should know about whether we're 
running
  in local mode.  If we want to optimize the performance of broadcast in 
local
  mode, then we should detect this at a higher level and use a dummy
  LocalBroadcastFactory implementation instead.

  Removing this code fixes a subtle error condition: in the old local mode
  code, a failure to find the broadcast in the local BlockManager would lead
  to an attempt to deblockify zero blocks, which could lead to confusing
  deserialization or decompression errors when we attempted to decompress
  an empty byte array.  This should never have happened, though: a failure 
to
  find the block in local mode is evidence of some other error.  The changes
  here will make it easier to debug those errors if they ever happen.

- Add a check that throws an exception when attempting to deblockify an
  empty array.

- Use ScalaCheck to add a test to check that TorrentBroadcast's
  blockifyObject and unBlockifyObject methods are inverses.

- Misc. cleanup and logging improvements.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark torrentbroadcast-bugfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2844.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2844


commit 48c98c1996c87cebbd0669924f57527b8e81c35e
Author: Josh Rosen joshro...@databricks.com
Date:   2014-10-19T06:36:49Z

[SPARK-3958] TorrentBroadcast cleanup / debugging improvements.

This PR makes several changes to TorrentBroadcast in order to make
it easier to reason about, which should help when debugging SPARK-3958.
The key changes:

- Remove all state from the global TorrentBroadcast object.  This state
  consisted mainly of configuration options, like the block size and
  compression codec, and was read by the blockify / unblockify methods.
  Unfortunately, the use of `lazy val` for `BLOCK_SIZE` meant that the block
  size was always determined by the first SparkConf that TorrentBroadast was
  initialized with; as a result, unit tests could not properly test
  TorrentBroadcast with different block sizes.

  Instead, blockifyObject and unBlockifyObject now accept compression codecs
  and blockSizes as arguments.  These arguments are supplied at the call 
sites
  inside of TorrentBroadcast instances.  Each TorrentBroadcast instance
  determines these values from SparkEnv's SparkConf.  I was careful to 
ensure
  that we do not accidentally serialize CompressionCodec or SparkConf 
objects
  as part of the TorrentBroadcast object.

- Remove special-case handling of local-mode in TorrentBroadcast.  I don't
  think that broadcast implementations should know about whether we're 
running
  in local mode.  If we want to optimize the performance of broadcast in 
local
  mode, then we should detect this at a higher level and use a dummy
  LocalBroadcastFactory implementation instead.

  Removing this code fixes a subtle error condition: in the old local mode
  code, a failure to find the broadcast in the local BlockManager would lead
  to an attempt to deblockify zero blocks, which could lead to confusing
  deserialization or decompression errors when we attempted to decompress
  an empty 

[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59641699
  
/cc @rxin for review.  I'd like to apply this to `branch-1.1` as well, 
since I believe that it's also affected by current TorrentBroadcast bugs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59641733
  
Also, /cc @davies, who helped me to spot the local mode might deblockify 
an empty array bug and who's been working on TorrentBroadcast optimizations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59641757
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21882/consoleFull)
 for   PR 2844 at commit 
[`618a872`](https://github.com/apache/spark/commit/618a87260faaebf353c1d9b4abc17af9f0cfa472).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3902] [SPARK-3590] Stabilize AsynRDDAct...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2760#issuecomment-59641916
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21883/consoleFull)
 for   PR 2760 at commit 
[`0d45fbc`](https://github.com/apache/spark/commit/0d45fbc9e41c8dc2fffd58a0a48c19a6d9dafdd8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3902] [SPARK-3590] Stabilize AsynRDDAct...

2014-10-19 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2760#issuecomment-59641956
  
@rxin Did you have any other feedback here?  If not, I'd like to merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2546] Clone JobConf for each task (bran...

2014-10-19 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2684#issuecomment-59642012
  
I'm going to merge this and cherry-pick it into all maintenance branches.  
We'll probably turn on cloning by default in 1.2 and we'll be sure to clearly 
document this configuration option in the 1.0.3 and 1.1.1 release notes.  
Thanks to everyone who helped test this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2546] Clone JobConf for each task (bran...

2014-10-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2684


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/2845

[SPARK-4000][BUILD] Sends archived unit tests logs to Jenkins master

This PR sends archived unit tests logs to the build history directory in 
Jenkins master, so that we can serve it via HTTP later to help debugging 
Jenkins build failures.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark log-archive

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2845.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2845


commit 4b912f78adbc9e6a1a3ca66bf32b5560d642ad5d
Author: Cheng Lian l...@databricks.com
Date:   2014-10-19T07:39:11Z

Sends archived unit tests logs to Jenkins master




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2845#issuecomment-59642454
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21884/consoleFull)
 for   PR 2845 at commit 
[`4b912f7`](https://github.com/apache/spark/commit/4b912f78adbc9e6a1a3ca66bf32b5560d642ad5d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2843#issuecomment-59642510
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/399/consoleFull)
 for   PR 2843 at commit 
[`9799b50`](https://github.com/apache/spark/commit/9799b505e63793beced7ed79793739c011ee4547).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59642634
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21882/consoleFull)
 for   PR 2844 at commit 
[`618a872`](https://github.com/apache/spark/commit/618a87260faaebf353c1d9b4abc17af9f0cfa472).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59642636
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21882/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59643009
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21885/consoleFull)
 for   PR 2844 at commit 
[`33fc754`](https://github.com/apache/spark/commit/33fc75447c676a5fca1f6f7e7095562f3a1583d5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3997][Build]scalastyle should output th...

2014-10-19 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/2846

[SPARK-3997][Build]scalastyle should output the error location



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-3997

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2846.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2846


commit 82c38ecdf56fa087606bc9c12df2b9602b5c91a7
Author: GuoQiang Li wi...@qq.com
Date:   2014-10-19T08:19:34Z

scalastyle should output the error location




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3997][Build]scalastyle should output th...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2846#issuecomment-59643139
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21886/consoleFull)
 for   PR 2846 at commit 
[`82c38ec`](https://github.com/apache/spark/commit/82c38ecdf56fa087606bc9c12df2b9602b5c91a7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2843#issuecomment-59643215
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21887/consoleFull)
 for   PR 2843 at commit 
[`da5e716`](https://github.com/apache/spark/commit/da5e716fd1b8cc48c43f37373641bbabbb91a11f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2843#issuecomment-59643247
  
Hm, 3 consecutive build failures, embarrassing...

For the first one, unit tests are not started at all, seems that the build 
process is interrupted somehow. The second failure is bit weird, although we're 
already using random port to avoid port conflict, it still failed to open the 
listening port. Checked the TCP port range in Jenkins master node, which should 
be valid. But I don't have access to the Jenkins slave node that executed this 
build. The cause of the third failure is a known bug fixed in the master 
branch, just rebased to the most recent master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3902] [SPARK-3590] Stabilize AsynRDDAct...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2760#issuecomment-59643261
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21883/consoleFull)
 for   PR 2760 at commit 
[`0d45fbc`](https://github.com/apache/spark/commit/0d45fbc9e41c8dc2fffd58a0a48c19a6d9dafdd8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class JavaFutureActionWrapper[S, T](futureAction: FutureAction[S], 
converter: S = T)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3902] [SPARK-3590] Stabilize AsynRDDAct...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2760#issuecomment-59643262
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21883/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2843#issuecomment-59643445
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/400/consoleFull)**
 for PR 2843 at commit 
[`9799b50`](https://github.com/apache/spark/commit/9799b505e63793beced7ed79793739c011ee4547)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2845#issuecomment-59643839
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21884/consoleFull)
 for   PR 2845 at commit 
[`4b912f7`](https://github.com/apache/spark/commit/4b912f78adbc9e6a1a3ca66bf32b5560d642ad5d).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59643838
  
It looks like this build is going to fail a ReplSuite test:

```scala
test(broadcast vars) {
// Test that the value that a broadcast var had when it was created is 
used,
// even if that variable is then modified in the driver program
// TODO: This doesn't actually work for arrays when we run in local 
mode!
val output = runInterpreter(local,
  
|var array = new Array[Int](5)
|val broadcastArray = sc.broadcast(array)
|sc.parallelize(0 to 4).map(x = broadcastArray.value(x)).collect
|array(0) = 5
|sc.parallelize(0 to 4).map(x = broadcastArray.value(x)).collect
  .stripMargin)
assertDoesNotContain(error:, output)
assertDoesNotContain(Exception, output)
assertContains(res0: Array[Int] = Array(0, 0, 0, 0, 0), output)
assertContains(res2: Array[Int] = Array(5, 0, 0, 0, 0), output)
  }
```

I see now that my change to remove the special local-mode handling 
inadvertently leads to a duplication of the variable in the driver program.  
This could maybe be a performance issue, since now we will use 2x the memory in 
the driver for each broadcast variable.  I'll restore the line that stores the 
local copy of the broadcast variable when it's created.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2845#issuecomment-59643840
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21884/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59643971
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21885/consoleFull)
 for   PR 2844 at commit 
[`33fc754`](https://github.com/apache/spark/commit/33fc75447c676a5fca1f6f7e7095562f3a1583d5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59643974
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21885/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59644064
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21888/consoleFull)
 for   PR 2844 at commit 
[`5c22782`](https://github.com/apache/spark/commit/5c227825b3cf0bbe3826e20fe66370229bfc43a2).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3997][Build]scalastyle should output th...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2846#issuecomment-59644562
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21886/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3997][Build]scalastyle should output th...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2846#issuecomment-59644561
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21886/consoleFull)
 for   PR 2846 at commit 
[`82c38ec`](https://github.com/apache/spark/commit/82c38ecdf56fa087606bc9c12df2b9602b5c91a7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori algorithm f...

2014-10-19 Thread jackylk
GitHub user jackylk opened a pull request:

https://github.com/apache/spark/pull/2847

[SPARK-4001][MLlib] adding apriori algorithm for frequent item set mining 
in Spark

Apriori is the classic algorithm for frequent item set mining in a 
transactional data set. It will be useful if Apriori algorithm is added to 
MLLib in Spark. This PR add an implementation for it. 
There is a point I am not sure wether it is most efficient. In order to 
filter out the eligible frequent item set, currently I am using a cartesian 
operation on two RDDs to calculate the degree of support of each item set, not 
sure wether it is better to use broadcast variable to achieve the same.

I will add an example to use this algorithm if requires

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/spark apriori

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2847.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2847


commit da2cba7e063745aacef74ff555e7bd7c55a24f56
Author: Jacky Li jacky.li...@huawei.com
Date:   2014-10-19T09:19:27Z

adding apriori algorithm for frequent item set mining in Spark

commit 889b33fdfabcc222c82e3bce619aeb6c7031fc58
Author: Jacky Li jacky.li...@huawei.com
Date:   2014-10-19T09:31:04Z

modify per scalastyle check




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori algorithm f...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2847#issuecomment-59644841
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2843#issuecomment-59645000
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21887/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2843#issuecomment-59644997
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21887/consoleFull)
 for   PR 2843 at commit 
[`da5e716`](https://github.com/apache/spark/commit/da5e716fd1b8cc48c43f37373641bbabbb91a11f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class SerializableMapWrapper[A, B](underlying: collection.Map[A, B])`
  * `class Predict(`
  * `case class EvaluatePython(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59645141
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21888/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59645137
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21888/consoleFull)
 for   PR 2844 at commit 
[`5c22782`](https://github.com/apache/spark/commit/5c227825b3cf0bbe3826e20fe66370229bfc43a2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3926 [CORE] Result of JavaRDD.collectAsM...

2014-10-19 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2805#issuecomment-59645175
  
Yes, all SGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3989]Added possibility to directly inst...

2014-10-19 Thread ziky90
Github user ziky90 commented on the pull request:

https://github.com/apache/spark/pull/2836#issuecomment-59645599
  
Ok, currently I'm using EMR instead of the spark-ec2 script, because it 
seems to me more convenient then connecting EC2 cluster from my own bash 
script, but you're right it's a possible way to go and it's not necessarily 
needed to have this functionality in spark-ec2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3989]Added possibility to directly inst...

2014-10-19 Thread ziky90
Github user ziky90 closed the pull request at:

https://github.com/apache/spark/pull/2836


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2845#discussion_r19059750
  
--- Diff: dev/run-tests-jenkins ---
@@ -92,12 +92,39 @@ function post_message () {
   echo   api_response: ${api_response} 2
   echo   data: ${data} 2
   fi
-  
+
   if [ $curl_status -eq 0 ]  [ $http_code -eq 201 ]; then
 echo   Post successful.
   fi
 }
 
+function send_archived_logs () {
+  echo Archiving unit tests logs...
+
+  local log_files=$(find . -name unit-tests.log)
+
+  if [ -z $log_files ]; then
+echo  No log files found. 2
+  else
+local log_archive=unit-tests-logs.tar.gz
+echo $log_files | xargs tar czf ${log_archive}
+
+local 
jenkins_build_dir=${JENKINS_HOME}/jobs/${JOB_NAME}/builds/${BUILD_NUMBER}
+local scp_output=$(scp ${log_archive} 
amp-jenkins-master:${jenkins_build_dir}/${log_archive})
--- End diff --

It's not good to hardcode Jenkins master hostname here. Should inject an 
extra environment variable `$MASTER_NODE_NAME` in Jenkins configurations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2845#issuecomment-59647627
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21889/consoleFull)
 for   PR 2845 at commit 
[`68c7010`](https://github.com/apache/spark/commit/68c7010748fd275cd4e10ac09d994dc0e61a4e24).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-19 Thread viper-kun
Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/2471#issuecomment-59648648
  
@vanzin. is it ok to go?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2845#issuecomment-59649466
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21889/consoleFull)
 for   PR 2845 at commit 
[`68c7010`](https://github.com/apache/spark/commit/68c7010748fd275cd4e10ac09d994dc0e61a4e24).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2845#issuecomment-59649470
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21889/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2816#issuecomment-59649582
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21890/consoleFull)
 for   PR 2816 at commit 
[`5c847ac`](https://github.com/apache/spark/commit/5c847aca4e7d618dee7b8c647bdca6f845d328e3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2845#issuecomment-59649597
  
Verified that the log archive was uploaded to the correct location in 
Jenkins master node.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-59649721
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21891/consoleFull)
 for   PR 2520 at commit 
[`553d9e9`](https://github.com/apache/spark/commit/553d9e9536e2e939278d238a0a34a3b9024590b5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2816#issuecomment-59650495
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21890/consoleFull)
 for   PR 2816 at commit 
[`5c847ac`](https://github.com/apache/spark/commit/5c847aca4e7d618dee7b8c647bdca6f845d328e3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2816#issuecomment-59650498
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21890/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...

2014-10-19 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2843#discussion_r19060362
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 ---
@@ -753,44 +753,19 @@ class HiveQuerySuite extends HiveComparisonTest {
 }
 
 assert(hiveconf.get(testKey, ) == testVal)
-assertResult(Set(testKey - testVal)) {
-  collectResults(sql(SET))
-}
+assertResult(Set(testKey - testVal))(collectResults(sql(SET)))
+assertResult(Set(testKey - testVal))(collectResults(sql(SET -v)))
 
 sql(sSET ${testKey + testKey}=${testVal + testVal})
 assert(hiveconf.get(testKey + testKey, ) == testVal + testVal)
 assertResult(Set(testKey - testVal, (testKey + testKey) - (testVal + 
testVal))) {
   collectResults(sql(SET))
 }
-
-// set key
-assertResult(Set(testKey - testVal)) {
-  collectResults(sql(sSET $testKey))
-}
-
-assertResult(Set(nonexistentKey - undefined)) {
-  collectResults(sql(sSET $nonexistentKey))
-}
-
-// Assert that sql() should have the same effects as sql() by 
repeating the above using sql().
-clear()
-assert(sql(SET).collect().size == 0)
-
-assertResult(Set(testKey - testVal)) {
-  collectResults(sql(sSET $testKey=$testVal))
-}
-
-assert(hiveconf.get(testKey, ) == testVal)
-assertResult(Set(testKey - testVal)) {
-  collectResults(sql(SET))
-}
-
-sql(sSET ${testKey + testKey}=${testVal + testVal})
-assert(hiveconf.get(testKey + testKey, ) == testVal + testVal)
 assertResult(Set(testKey - testVal, (testKey + testKey) - (testVal + 
testVal))) {
--- End diff --

These lines are removed because they were originally for testing the 
deprecated `hql` call. At that time `sql` and `hql` have different code paths. 
Later on those `hql` calls were changed to `sql` to avoid compile time 
deprecation warning, and this makes them absolutely duplicated code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-19 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2816#issuecomment-59650806
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2816#issuecomment-59650907
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21892/consoleFull)
 for   PR 2816 at commit 
[`5c847ac`](https://github.com/apache/spark/commit/5c847aca4e7d618dee7b8c647bdca6f845d328e3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-59651589
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21891/consoleFull)
 for   PR 2520 at commit 
[`553d9e9`](https://github.com/apache/spark/commit/553d9e9536e2e939278d238a0a34a3b9024590b5).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-59651592
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21891/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2816#issuecomment-59652376
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21892/consoleFull)
 for   PR 2816 at commit 
[`5c847ac`](https://github.com/apache/spark/commit/5c847aca4e7d618dee7b8c647bdca6f845d328e3).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2816#issuecomment-59652379
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21892/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59653126
  
This most recent test-failure is another side-effect of removing 
TorrentBroadcast's optimizations for local mode:

```
[info] - Unpersisting TorrentBroadcast on executors only in local mode *** 
FAILED ***
[info]   1 did not equal 0 (BroadcastSuite.scala:219)
[info] - Unpersisting TorrentBroadcast on executors and driver in local 
mode *** FAILED ***
[info]   1 did not equal 0 (BroadcastSuite.scala:219)
```

This time, the error is because there's a check that asserts that broadcast 
pieces are not stored into the driver's block manager when running in local 
mode.  I don't think that this optimization necessarily makes sense, since 
we'll have to store those blocks anyways when running in distributed mode.  
Therefore, I'm going to change these tests to remove this local-mode 
special-casing.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59653640
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21893/consoleFull)
 for   PR 2844 at commit 
[`c3b08f9`](https://github.com/apache/spark/commit/c3b08f93b61f0748b7c42fc32314bd92150e5b88).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59655954
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21893/consoleFull)
 for   PR 2844 at commit 
[`c3b08f9`](https://github.com/apache/spark/commit/c3b08f93b61f0748b7c42fc32314bd92150e5b88).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59655958
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21893/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...

2014-10-19 Thread scwf
Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/2576#discussion_r19061946
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcTableOperations.scala 
---
@@ -0,0 +1,351 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+package org.apache.spark.sql.hive.orc
+
+import java.io.IOException
+import java.text.SimpleDateFormat
+import java.util.{Locale, Date}
+import scala.collection.JavaConversions._
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.mapreduce.lib.output.{FileOutputFormat, 
FileOutputCommitter}
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
+import org.apache.hadoop.io.{Writable, NullWritable}
+import org.apache.hadoop.mapreduce.{TaskID, TaskAttemptContext, Job}
+import org.apache.hadoop.hive.ql.io.orc.{OrcSerde, OrcInputFormat, 
OrcOutputFormat}
+import org.apache.hadoop.hive.serde2.objectinspector._
+import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal, HiveVarchar}
+import org.apache.hadoop.mapred.{SparkHadoopMapRedUtil, Reporter, JobConf}
+
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.parquet.FileSystemHelper
+import org.apache.spark.{TaskContext, SerializableWritable}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.Utils._
+
+/**
+ * orc table scan operator. Imports the file that backs the given
+ * [[org.apache.spark.sql.hive.orc.OrcRelation]] as a ``RDD[Row]``.
+ */
+case class OrcTableScan(
+output: Seq[Attribute],
+relation: OrcRelation,
+columnPruningPred: Option[Expression])
+  extends LeafNode {
+
+  @transient
+  lazy val serde: OrcSerde = initSerde
+
+  @transient
+  lazy val getFieldValue: Seq[Product = Any] = {
+val inspector = 
serde.getObjectInspector.asInstanceOf[StructObjectInspector]
+output.map(attr = {
+  val ref = 
inspector.getStructFieldRef(attr.name.toLowerCase(Locale.ENGLISH))
+  row: Product = {
+val fieldData = row.productElement(1)
+val data = inspector.getStructFieldData(fieldData, ref)
+unwrapData(data, ref.getFieldObjectInspector)
+  }
+})
+  }
+
+  private def initSerde(): OrcSerde = {
+val serde = new OrcSerde
+serde.initialize(null, relation.prop)
+serde
+  }
+
+  def unwrapData(data: Any, oi: ObjectInspector): Any = oi match {
+case pi: PrimitiveObjectInspector = pi.getPrimitiveJavaObject(data)
+case li: ListObjectInspector =
+  Option(li.getList(data))
+.map(_.map(unwrapData(_, li.getListElementObjectInspector)).toSeq)
+.orNull
+case mi: MapObjectInspector =
+  Option(mi.getMap(data)).map(
+_.map {
+  case (k, v) =
+(unwrapData(k, mi.getMapKeyObjectInspector),
+  unwrapData(v, mi.getMapValueObjectInspector))
+}.toMap).orNull
+case si: StructObjectInspector =
+  val allRefs = si.getAllStructFieldRefs
+  new GenericRow(
+allRefs.map(r =
+  unwrapData(si.getStructFieldData(data, r), 
r.getFieldObjectInspector)).toArray)
+  }
+
+  override def execute(): RDD[Row] = {
+val sc = sqlContext.sparkContext
+val job = new Job(sc.hadoopConfiguration)
+
+val conf: Configuration = job.getConfiguration
+val fileList = FileSystemHelper.listFiles(relation.path, conf)
+
+// add all paths in the directory but skip hidden ones such
+// as _SUCCESS
+for (path - fileList if !path.getName.startsWith(_)) {
+  FileInputFormat.addInputPath(job, path)
+}
+
+setColumnIds(output, relation, conf)
+val inputClass = classOf[OrcInputFormat].asInstanceOf[
+  Class[_ : 

[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2576#issuecomment-59660023
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21894/consoleFull)
 for   PR 2576 at commit 
[`f680da0`](https://github.com/apache/spark/commit/f680da07742605e6a38bf4132477e063b2b22548).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

2014-10-19 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/2780#issuecomment-59661579
  
@chouqin Thanks for the updates!  The updates look good.

One more small comment: Could you please add explicit checks in the unit 
tests to make sure the returned splits are distinct?  I should have thought of 
that earlier.

I'll try some timing tests to make sure the sampling does not take too long.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2576#issuecomment-59663743
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21894/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/2844#discussion_r19063222
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -76,23 +87,20 @@ private[spark] class TorrentBroadcast[T: ClassTag](
* @return number of blocks this broadcast variable is divided into
*/
   private def writeBlocks(): Int = {
-// For local mode, just put the object in the BlockManager so we can 
find it later.
-SparkEnv.get.blockManager.putSingle(
-  broadcastId, _value, StorageLevel.MEMORY_AND_DISK, tellMaster = 
false)
-
-if (!isLocal) {
-  val blocks = TorrentBroadcast.blockifyObject(_value)
-  blocks.zipWithIndex.foreach { case (block, i) =
-SparkEnv.get.blockManager.putBytes(
-  BroadcastBlockId(id, piece + i),
-  block,
-  StorageLevel.MEMORY_AND_DISK_SER,
-  tellMaster = true)
-  }
-  blocks.length
-} else {
-  0
+// Store a copy of the broadcast variable in the driver so that tasks 
run on the driver
+// do not create a duplicate copy of the broadcast variable's value.
+SparkEnv.get.blockManager.putSingle(broadcastId, _value, 
StorageLevel.MEMORY_AND_DISK,
+  tellMaster = false)
--- End diff --

I wonder that store a serialized copy in local mode will not help anything. 
If it failed to fetch the original copy of value from blockManager, it will 
also can not fetch the serialized copy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/2844#discussion_r19063253
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -62,6 +59,20 @@ private[spark] class TorrentBroadcast[T: ClassTag](
* blocks from the driver and/or other executors.
*/
   @transient private var _value: T = obj
+  /** The compression codec to use, or None if compression is disabled */
+  @transient private var compressionCodec: Option[CompressionCodec] = _
+  /** Size of each block. Default value is 4MB.  This value is only read 
by the broadcaster. */
+  @transient private var blockSize: Int = _
--- End diff --

How about move these two as part of Constructor? Reading the Conf in 
TorrentBroadcastFactor


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

2014-10-19 Thread manishamde
Github user manishamde commented on the pull request:

https://github.com/apache/spark/pull/2780#issuecomment-59664449
  
@chouqin LGTM. :+1: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/2844#discussion_r19063271
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -156,6 +158,7 @@ private[spark] class TorrentBroadcast[T: ClassTag](
   private def readObject(in: ObjectInputStream) {
 in.defaultReadObject()
 TorrentBroadcast.synchronized {
+  setConf(SparkEnv.get.conf)
--- End diff --

This looks wired, how can we make sure that this conf is equals to the one 
used when create the Broadcast?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2844#discussion_r19063287
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -156,6 +158,7 @@ private[spark] class TorrentBroadcast[T: ClassTag](
   private def readObject(in: ObjectInputStream) {
 in.defaultReadObject()
 TorrentBroadcast.synchronized {
+  setConf(SparkEnv.get.conf)
--- End diff --

The conf is application-scoped.  The same conf should be present on this 
application's executors, where this task will be deserialized.  This assumption 
is used elsewhere, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2844#discussion_r19063336
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -76,23 +87,20 @@ private[spark] class TorrentBroadcast[T: ClassTag](
* @return number of blocks this broadcast variable is divided into
*/
   private def writeBlocks(): Int = {
-// For local mode, just put the object in the BlockManager so we can 
find it later.
-SparkEnv.get.blockManager.putSingle(
-  broadcastId, _value, StorageLevel.MEMORY_AND_DISK, tellMaster = 
false)
-
-if (!isLocal) {
-  val blocks = TorrentBroadcast.blockifyObject(_value)
-  blocks.zipWithIndex.foreach { case (block, i) =
-SparkEnv.get.blockManager.putBytes(
-  BroadcastBlockId(id, piece + i),
-  block,
-  StorageLevel.MEMORY_AND_DISK_SER,
-  tellMaster = true)
-  }
-  blocks.length
-} else {
-  0
+// Store a copy of the broadcast variable in the driver so that tasks 
run on the driver
+// do not create a duplicate copy of the broadcast variable's value.
+SparkEnv.get.blockManager.putSingle(broadcastId, _value, 
StorageLevel.MEMORY_AND_DISK,
+  tellMaster = false)
--- End diff --

The reason for this store is to avoid creating two copies of `_value` in 
the driver.  If we serialize and deserialize a broadcast variable on the driver 
and then attempt to access its value, then without this code we will end up 
going through the regular de-chunking code path, which will cause us to 
deserialize the serialized copy of `_value` and waste memory. 

I believe that this serialization and deserialization can take place when 
tasks are run in local mode, since we still serialize tasks in order to help 
users be aware of serialization issues that would impact them if they moved to 
a cluster.  This complexity is another reason why I'm in favor of just 
scrapping all local-mode special-casing and configuring Spark to use a dummy 
LocalBroadcastFactory for local mode instead of whichever setting the user 
specified.  That would be a larger, more-invasive change, which is why I opted 
for the simpler fix here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

2014-10-19 Thread manishamde
Github user manishamde commented on the pull request:

https://github.com/apache/spark/pull/2780#issuecomment-59664847
  
@jkbradley I read the paper by Sanku et al and other papers but they 
required a custom implementation. The sort method has worked OK so far but I 
was hoping somebody would implement a generic quantile approximation algorithm 
for Spark that is O(n) and requires limited memory. I think such methods exist 
in other libraries such as 
[Algebird](http://twitter.github.io/algebird/com/twitter/algebird/QTree$.html) 
and [Tdigest](https://github.com/tdunning/t-digest). We should also look 
whether BlinkDB has attempted to tackle this problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2844#discussion_r19063363
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -62,6 +59,20 @@ private[spark] class TorrentBroadcast[T: ClassTag](
* blocks from the driver and/or other executors.
*/
   @transient private var _value: T = obj
+  /** The compression codec to use, or None if compression is disabled */
+  @transient private var compressionCodec: Option[CompressionCodec] = _
+  /** Size of each block. Default value is 4MB.  This value is only read 
by the broadcaster. */
+  @transient private var blockSize: Int = _
--- End diff --

I thought about this and agree that it might be cleaner, but this will 
require more refactoring of other code.  One design goal here was to minimize 
the serialized size of TorrentBroadcast objects, so we can't serialize the 
SparkConf or CompressionCodec instances (which contain SparkConfs).  
SparkEnv.conf determines these values anyways.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/2844#discussion_r19063455
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -179,43 +183,29 @@ private[spark] class TorrentBroadcast[T: ClassTag](
 
 
 private object TorrentBroadcast extends Logging {
-  /** Size of each block. Default value is 4MB. */
-  private lazy val BLOCK_SIZE = conf.getInt(spark.broadcast.blockSize, 
4096) * 1024
-  private var initialized = false
-  private var conf: SparkConf = null
-  private var compress: Boolean = false
-  private var compressionCodec: CompressionCodec = null
-
-  def initialize(_isDriver: Boolean, conf: SparkConf) {
-TorrentBroadcast.conf = conf // TODO: we might have to fix it in tests
-synchronized {
-  if (!initialized) {
-compress = conf.getBoolean(spark.broadcast.compress, true)
-compressionCodec = CompressionCodec.createCodec(conf)
-initialized = true
-  }
-}
-  }
 
-  def stop() {
-initialized = false
-  }
-
-  def blockifyObject[T: ClassTag](obj: T): Array[ByteBuffer] = {
-val bos = new ByteArrayChunkOutputStream(BLOCK_SIZE)
-val out: OutputStream = if (compress) 
compressionCodec.compressedOutputStream(bos) else bos
-val ser = SparkEnv.get.serializer.newInstance()
+  def blockifyObject[T: ClassTag](
--- End diff --

The conf has been moved into `class Broadcast`, maybe blockifyObject and 
unblockify also should be moved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Minor change in the comment of spark-defaults....

2014-10-19 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/2709#issuecomment-59667207
  
@andrewor14 Sorry for late reply since I was on vacation in Europe last 
week. I can continue work on this after I finish my talk in IOTA conf tomorrow. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2814#issuecomment-59669342
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21895/consoleFull)
 for   PR 2814 at commit 
[`11e7d5d`](https://github.com/apache/spark/commit/11e7d5d6edf48fc386f8cf58c91fe2c4bdadc45e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-19 Thread ryan-williams
GitHub user ryan-williams opened a pull request:

https://github.com/apache/spark/pull/2848

[SPARK-3967] don’t redundantly overwrite executor JAR deps



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ryan-williams/spark fetch-file

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2848.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2848


commit f3c80ae80be474ed9928b319f2b0d7808b028915
Author: Ryan Williams ryan.blake.willi...@gmail.com
Date:   2014-10-17T22:21:23Z

don’t redundantly overwrite executor JAR deps

see SPARK-3967




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59669916
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...

2014-10-19 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/2743#discussion_r19064542
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
@@ -63,9 +64,12 @@ private[spark] class PythonRDD(
 val localdir = env.blockManager.diskBlockManager.localDirs.map(
   f = f.getPath()).mkString(,)
 envVars += (SPARK_LOCAL_DIRS - localdir) // it's also used in 
monitor thread
-if (reuse_worker) {
+if (reuseWorker) {
   envVars += (SPARK_REUSE_WORKER - 1)
 }
+if (!memoryLimit.isEmpty) {
+  envVars += (PYSPARK_WORKER_MEMORY_LIMIT - memoryLimit)
--- End diff --

@davies - the environment variable is only for internal use, correct? One 
thing is we could name this to make it more clear that is is only for internal 
use:

```
_PYSPARK_WORKER_MEMORY_LIMIT
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/2845#discussion_r19064621
  
--- Diff: dev/run-tests-jenkins ---
@@ -92,12 +92,39 @@ function post_message () {
   echo   api_response: ${api_response} 2
   echo   data: ${data} 2
   fi
-  
+
   if [ $curl_status -eq 0 ]  [ $http_code -eq 201 ]; then
 echo   Post successful.
   fi
 }
 
+function send_archived_logs () {
+  echo Archiving unit tests logs...
+
+  local log_files=$(find . -name unit-tests.log)
+
+  if [ -z $log_files ]; then
+echo  No log files found. 2
+  else
+local log_archive=unit-tests-logs.tar.gz
+echo $log_files | xargs tar czf ${log_archive}
+
+local 
jenkins_build_dir=${JENKINS_HOME}/jobs/${JOB_NAME}/builds/${BUILD_NUMBER}
+local scp_output=$(scp ${log_archive} 
amp-jenkins-master:${jenkins_build_dir}/${log_archive})
--- End diff --

I'm confused actually - is amp-jenkins-master the current hostname of the 
master machine?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/2845#discussion_r19064633
  
--- Diff: dev/run-tests-jenkins ---
@@ -92,12 +92,39 @@ function post_message () {
   echo   api_response: ${api_response} 2
   echo   data: ${data} 2
   fi
-  
+
   if [ $curl_status -eq 0 ]  [ $http_code -eq 201 ]; then
 echo   Post successful.
   fi
 }
 
+function send_archived_logs () {
+  echo Archiving unit tests logs...
+
+  local log_files=$(find . -name unit-tests.log)
+
+  if [ -z $log_files ]; then
+echo  No log files found. 2
+  else
+local log_archive=unit-tests-logs.tar.gz
+echo $log_files | xargs tar czf ${log_archive}
--- End diff --

Just wondering, will these appear in the tarfile under the full path (e.g. 
streaming/target/unit-tests.log)? That's ideal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/2845#discussion_r19064648
  
--- Diff: dev/run-tests-jenkins ---
@@ -92,12 +92,39 @@ function post_message () {
   echo   api_response: ${api_response} 2
   echo   data: ${data} 2
   fi
-  
+
   if [ $curl_status -eq 0 ]  [ $http_code -eq 201 ]; then
 echo   Post successful.
   fi
 }
 
+function send_archived_logs () {
+  echo Archiving unit tests logs...
+
+  local log_files=$(find . -name unit-tests.log)
+
+  if [ -z $log_files ]; then
+echo  No log files found. 2
+  else
+local log_archive=unit-tests-logs.tar.gz
+echo $log_files | xargs tar czf ${log_archive}
+
+local 
jenkins_build_dir=${JENKINS_HOME}/jobs/${JOB_NAME}/builds/${BUILD_NUMBER}
--- End diff --

Should we add BUILD_NUMBER in the message that we post? Something like this:

```
[Test build #XXX has started/finished] for PR 2845 at commit 4b912f7 (build 
$XXX).
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2845#issuecomment-59671283
  
Overall this looks good, had a few minor questions. One thing we can do 
next is that we could scp the logs to a web server that we control (e.g. 
something under people.apache.org) and clean up the old ones every time we copy 
something over.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2845#discussion_r19064710
  
--- Diff: dev/run-tests-jenkins ---
@@ -92,12 +92,39 @@ function post_message () {
   echo   api_response: ${api_response} 2
   echo   data: ${data} 2
   fi
-  
+
   if [ $curl_status -eq 0 ]  [ $http_code -eq 201 ]; then
 echo   Post successful.
   fi
 }
 
+function send_archived_logs () {
+  echo Archiving unit tests logs...
+
+  local log_files=$(find . -name unit-tests.log)
+
+  if [ -z $log_files ]; then
+echo  No log files found. 2
+  else
+local log_archive=unit-tests-logs.tar.gz
+echo $log_files | xargs tar czf ${log_archive}
+
+local 
jenkins_build_dir=${JENKINS_HOME}/jobs/${JOB_NAME}/builds/${BUILD_NUMBER}
--- End diff --

That could be useful. We may also be able to do away with the for PR  
part since that's kinda redundant. 

Note that you can currently get the build number from the build URL in the 
existing messages posted to GitHub.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3902] [SPARK-3590] Stabilize AsynRDDAct...

2014-10-19 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2760#issuecomment-59672002
  
LGTM - we discussed some details of this offline last week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2814#issuecomment-59672188
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21895/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2845#discussion_r19064925
  
--- Diff: dev/run-tests-jenkins ---
@@ -92,12 +92,39 @@ function post_message () {
   echo   api_response: ${api_response} 2
   echo   data: ${data} 2
   fi
-  
+
   if [ $curl_status -eq 0 ]  [ $http_code -eq 201 ]; then
 echo   Post successful.
   fi
 }
 
+function send_archived_logs () {
+  echo Archiving unit tests logs...
+
+  local log_files=$(find . -name unit-tests.log)
+
+  if [ -z $log_files ]; then
+echo  No log files found. 2
+  else
+local log_archive=unit-tests-logs.tar.gz
+echo $log_files | xargs tar czf ${log_archive}
+
+local 
jenkins_build_dir=${JENKINS_HOME}/jobs/${JOB_NAME}/builds/${BUILD_NUMBER}
+local scp_output=$(scp ${log_archive} 
amp-jenkins-master:${jenkins_build_dir}/${log_archive})
--- End diff --

This hostname is accessible from Jenkins slave nodes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2814#issuecomment-59672186
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21895/consoleFull)
 for   PR 2814 at commit 
[`11e7d5d`](https://github.com/apache/spark/commit/11e7d5d6edf48fc386f8cf58c91fe2c4bdadc45e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...

2014-10-19 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2845#discussion_r19064927
  
--- Diff: dev/run-tests-jenkins ---
@@ -92,12 +92,39 @@ function post_message () {
   echo   api_response: ${api_response} 2
   echo   data: ${data} 2
   fi
-  
+
   if [ $curl_status -eq 0 ]  [ $http_code -eq 201 ]; then
 echo   Post successful.
   fi
 }
 
+function send_archived_logs () {
+  echo Archiving unit tests logs...
+
+  local log_files=$(find . -name unit-tests.log)
+
+  if [ -z $log_files ]; then
+echo  No log files found. 2
+  else
+local log_archive=unit-tests-logs.tar.gz
+echo $log_files | xargs tar czf ${log_archive}
--- End diff --

Yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [spark-3907][sql] add truncate table support

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2770#issuecomment-59672228
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/402/consoleFull)
 for   PR 2770 at commit 
[`f6e710e`](https://github.com/apache/spark/commit/f6e710e7d2c455d57065bd712789b7dd0bf357fb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3904] [SQL] add constant objectinspecto...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2762#issuecomment-59672239
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/403/consoleFull)
 for   PR 2762 at commit 
[`49d442b`](https://github.com/apache/spark/commit/49d442bb97259b3a3a07456d65b27e9c2696b916).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2607#issuecomment-59672538
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21896/consoleFull)
 for   PR 2607 at commit 
[`6a11c02`](https://github.com/apache/spark/commit/6a11c0249268378b3319644f467daefa8807a899).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >