date:20141205

[GitHub] spark pull request: [SPARK-3623][GraphX] GraphX should support the...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2631#issuecomment-65888268
  
  [Test build #24208 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24208/consoleFull)
 for   PR 2631 at commit 
[`a70c500`](https://github.com/apache/spark/commit/a70c5001977b7ab0a10716f69190ed0a6a797d5d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3623][GraphX] GraphX should support the...

2014-12-05 Thread ankurdave

Github user ankurdave commented on the pull request:

https://github.com/apache/spark/pull/2631#issuecomment-65888121
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3623][GraphX] GraphX should support the...

2014-12-05 Thread witgo

Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/2631#issuecomment-65887724
  
Jenkins, retest this please. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65886768
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24207/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65886767
  
  [Test build #24207 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24207/consoleFull)
 for   PR 3625 at commit 
[`ad4241a`](https://github.com/apache/spark/commit/ad4241ad8e6f353e5be872e20f6c4261180c648a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65885551
  
  [Test build #24206 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24206/consoleFull)
 for   PR 3625 at commit 
[`ad4241a`](https://github.com/apache/spark/commit/ad4241ad8e6f353e5be872e20f6c4261180c648a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65885553
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24206/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65885123
  
  [Test build #24207 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24207/consoleFull)
 for   PR 3625 at commit 
[`ad4241a`](https://github.com/apache/spark/commit/ad4241ad8e6f353e5be872e20f6c4261180c648a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65885045
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4756][SQL] FIX: sessionToActivePool gro...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3617#issuecomment-65884637
  
  [Test build #24204 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24204/consoleFull)
 for   PR 3617 at commit 
[`e9b97b8`](https://github.com/apache/spark/commit/e9b97b82036ca9672492c87b493fc576a349312e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4756][SQL] FIX: sessionToActivePool gro...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3617#issuecomment-65884641
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24204/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65884482
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-05 Thread nkronenfeld

Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-65883848
  
oh, a note for when you're reviewing - I didn't move the clear call, I just 
added a second one; I saw no particular harm in leaving the old one there too, 
just in case, but I can't see it doing all that much anymore - it should always 
be a no-op now.  I'd be happier removing it if, again, I could figure out a 
good unit test to make sure all was functioning properly when I did so.  But I 
would be totally open to removing it in the interests of code cleanliness if 
you want.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65883784
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24205/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-05 Thread nkronenfeld

Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-65883762
  
great... I think outside the mima issue, it should be all set, unless I can 
figure out a way to unit test it.  So far, my best methods of testing it 
involve instrumenting the code in ways I shouldn't check in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-05 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-65883736
  
No, I'd leave it.  I just thought I'd mention it so that we eventually
investigate.  I'll finish reviewing this PR later this weekend.

On Fri, Dec 5, 2014 at 7:27 PM, Nathan Kronenfeld 
wrote:

> Should I back out the correction to the mima failure?
>
> â
> Reply to this email directly or view it on GitHub
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-05 Thread nkronenfeld

Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-65883687
  
Should I back out the correction to the mima failure? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65883623
  
  [Test build #24206 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24206/consoleFull)
 for   PR 3625 at commit 
[`ad4241a`](https://github.com/apache/spark/commit/ad4241ad8e6f353e5be872e20f6c4261180c648a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4774] [SQL] Makes HiveFromSpark more po...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3628#issuecomment-65883207
  
  [Test build #24202 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24202/consoleFull)
 for   PR 3628 at commit 
[`6770f83`](https://github.com/apache/spark/commit/6770f835380a741525bdc173e6fa336941477a25).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4774] [SQL] Makes HiveFromSpark more po...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3628#issuecomment-65883209
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24202/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/3625#discussion_r21413733
  
--- Diff: 
network/common/src/test/java/org/apache/spark/network/TransportClientFactorySuite.java
 ---
@@ -57,16 +62,113 @@ public void tearDown() {
 JavaUtils.closeQuietly(server2);
   }
 
+  /**
+   * Request a bunch of clients to a single server to test
+   * we create up to maxConnections of clients.
+   */
+  private void testClientReuse(final int maxConnections) throws 
IOException {
+TransportConf conf = new TransportConf(new ConfigProvider() {
+  @Override
+  public String get(String name) {
+if (name.equals("spark.shuffle.io.numConnectionsPerPeer")) {
+  return Integer.toString(maxConnections);
+} else {
+  throw new NoSuchElementException();
+}
+  }
+});
+
+RpcHandler rpcHandler = new NoOpRpcHandler();
+TransportContext context = new TransportContext(conf, rpcHandler);
+TransportClientFactory factory = context.createClientFactory();
+HashSet clients = new HashSet();
+for (int i = 0; i < maxConnections * 10; i++) {
+  TransportClient client = 
factory.createClient(TestUtils.getLocalHost(), server1.getPort());
+  assert(client.isActive());
+  clients.add(client);
+}
+
+assert(clients.size() == maxConnections);
--- End diff --

seems like we should close the clients afterwards


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/3625#discussion_r21413717
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ---
@@ -143,43 +190,41 @@ public void initChannel(SocketChannel ch) {
 assert client != null : "Channel future completed successfully with 
null client";
 
 // Execute any client bootstraps synchronously before marking the 
Client as successful.
-long preBootstrap = System.currentTimeMillis();
+long preBootstrap = System.nanoTime();
 logger.debug("Connection to {} successful, running bootstraps...", 
address);
 try {
   for (TransportClientBootstrap clientBootstrap : clientBootstraps) {
 clientBootstrap.doBootstrap(client);
   }
 } catch (Exception e) { // catch non-RuntimeExceptions too as 
bootstrap may be written in Scala
-  long bootstrapTime = System.currentTimeMillis() - preBootstrap;
-  logger.error("Exception while bootstrapping client after " + 
bootstrapTime + " ms", e);
+  long bootstrapTimeMs = (System.nanoTime() - preBootstrap) / 100;
+  logger.error("Exception while bootstrapping client after " + 
bootstrapTimeMs + " ms", e);
   client.close();
   throw Throwables.propagate(e);
 }
-long postBootstrap = System.currentTimeMillis();
-
-// Successful connection & bootstrap -- in the event that two threads 
raced to create a client,
-// use the first one that was put into the connectionPool and close 
the one we made here.
-TransportClient oldClient = connectionPool.putIfAbsent(address, 
client);
-if (oldClient == null) {
-  logger.debug("Successfully created connection to {} after {} ms ({} 
ms spent in bootstraps)",
-address, postBootstrap - preConnect, postBootstrap - preBootstrap);
-  return client;
-} else {
-  logger.debug("Two clients were created concurrently after {} ms, 
second will be disposed.",
-postBootstrap - preConnect);
-  client.close();
-  return oldClient;
-}
+long postBootstrap = System.nanoTime();
+
+logger.debug("Successfully created connection to {} after {} ms ({} ms 
spent in bootstraps)",
+  address, (postBootstrap - preConnect) / 100, (postBootstrap - 
preBootstrap) / 100);
+
+return client;
   }
 
   /** Close all connections in the connection pool, and shutdown the 
worker thread pool. */
   @Override
   public void close() {
-for (TransportClient client : connectionPool.values()) {
-  try {
-client.close();
-  } catch (RuntimeException e) {
-logger.warn("Ignoring exception during close", e);
+// Go through all clients and close them if they are active.
+for (ClientPool clientPool : connectionPool.values()) {
+  for (int i = 0; i < clientPool.clients.length; i++) {
+TransportClient client = clientPool.clients[i];
+if (client != null) {
+  clientPool.clients[i] = null;
+  try {
+client.close();
--- End diff --

Q: Can we use JavaUtils.closeQuietly(client) here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4756][SQL] FIX: sessionToActivePool gro...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3617#issuecomment-65883022
  
  [Test build #24204 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24204/consoleFull)
 for   PR 3617 at commit 
[`e9b97b8`](https://github.com/apache/spark/commit/e9b97b82036ca9672492c87b493fc576a349312e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4774] [SQL] Makes HiveFromSpark more po...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3628#issuecomment-65882567
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24201/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3623][GraphX] GraphX should support the...

2014-12-05 Thread ankurdave

Github user ankurdave commented on the pull request:

https://github.com/apache/spark/pull/2631#issuecomment-65882576
  
Great, thanks! LGTM - I'll merge this after Jenkins runs.

ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4774] [SQL] Makes HiveFromSpark more po...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3628#issuecomment-65882564
  
  [Test build #24201 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24201/consoleFull)
 for   PR 3628 at commit 
[`fca97ee`](https://github.com/apache/spark/commit/fca97eee9953f4f30284cd86cd70019e8b77a57e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4756][SQL] FIX: sessionToActivePool gro...

2014-12-05 Thread guowei2

Github user guowei2 commented on the pull request:

https://github.com/apache/spark/pull/3617#issuecomment-65882460
  
Sorry for that.
i have a mistake about not test with shim12


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution

2014-12-05 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/3564#issuecomment-65882473
  
Hmm, this error from the latest test is interesting:

```
[info] - Read with RegexSerDe *** FAILED *** (2 seconds, 339 milliseconds)
[info]   Failed to generate golden answer for query:
[info]   Error: src/test/resources/golden/Read with 
RegexSerDe-0-9b96fab8d55a0e19fae00d8adb57ffaa (No such file or directory)
[info]   java.io.FileNotFoundException: src/test/resources/golden/Read with 
RegexSerDe-0-9b96fab8d55a0e19fae00d8adb57ffaa (No such file or directory)
```

When I comment out [the line grouping 
tests](https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L429)
 everything runs fine.

Are the forked JVMs somehow not picking up paths or whatnot correctly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3564#issuecomment-65881904
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24203/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3564#issuecomment-65881901
  
  [Test build #24203 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24203/consoleFull)
 for   PR 3564 at commit 
[`ab127b7`](https://github.com/apache/spark/commit/ab127b798dbfa9399833d546e627f9651b060918).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3564#issuecomment-65881246
  
  [Test build #24203 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24203/consoleFull)
 for   PR 3564 at commit 
[`ab127b7`](https://github.com/apache/spark/commit/ab127b798dbfa9399833d546e627f9651b060918).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution

2014-12-05 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/3564#issuecomment-65881199
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3623][GraphX] GraphX should support the...

2014-12-05 Thread witgo

Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/2631#issuecomment-65880798
  
@ankurdave 
I have removed the Spark core related to modify


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4774] [SQL] Makes HiveFromSpark more po...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3628#issuecomment-65880549
  
  [Test build #24202 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24202/consoleFull)
 for   PR 3628 at commit 
[`6770f83`](https://github.com/apache/spark/commit/6770f835380a741525bdc173e6fa336941477a25).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-05 Thread avulanov

Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-65880449
  
@jkbradley @manishamde I did performance comparisons with multinomial 
regression and posted them here: 
https://github.com/apache/spark/pull/1379#issuecomment-65879536. Suggestions 
are very welcome.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65880355
  
  [Test build #24200 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24200/consoleFull)
 for   PR 3625 at commit 
[`0fefabb`](https://github.com/apache/spark/commit/0fefabb9d63dec4ad2175338ee7d42ad622a9536).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65880366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24200/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4774] Makes HiveFromSpark more portable

2014-12-05 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3628#issuecomment-65880042
  
Thanks for doing this!  Mind adding `[SQL]` to the PR title?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65879737
  
  [Test build #24199 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24199/consoleFull)
 for   PR 3625 at commit 
[`41dfcb2`](https://github.com/apache/spark/commit/41dfcb260e71ecc5d497d64c80f976f350f59be5).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65879741
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24199/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-05 Thread avulanov

Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1379#issuecomment-65879536
  
@dbtsai Here are the results of my tests:
- Settings:
 - Spark: latest Spark merged with 
https://github.com/dbtsai/spark/tree/dbtsai-mlor (manual merge) and 
https://github.com/avulanov/spark/tree/annclassifier. Optimizer in MLOR was 
changed to LBFGS to make a correct comparison with ANN which uses LBFGS.
 - Hadoop 1.2.1, dataset is loaded from hdfs
 - Cluster: 6 machines Xeon 3.3GHz, 16GB RAM, each machine has 2 Spark 
Workers with maximum 8GB or RAM and 2GB used, total 16 workers
 - Dataset: mnist8m; classes: 10;  data: 8,100,000 instances; features: 
784; random split 99% train, 1% test
 - Link to the dataset: 
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/mnist8m.scale.bz2
 - Learning settings: 40 iterations, tolerance=1e-4 (both); ANN 
classifier: hidden layer `Array[Int]()` (no hidden layer - the same as 
regression)
- Result
 - ANN classifier: training time: 00:47:55; accuracy: 0.848
 - MLOR: training time: 01:30:45; accuracy: 0.864
- Average gradient compute time (`mapPartitionsWithIndex at 
RDDFunctions.scala:108`)
 - ANN classifier: 51 seconds
 - MLOR: 2.1 minutes
- Average update time (`reduce at RDDFunctions.scala:112`)
 - ANN classifier: 90 ms
 - MLOR: 90 ms

It seems that ANN is almost 2x faster (with the mentioned settings), though 
accuracy is 1.6% smaller. The difference in accuracy can be explained by the 
fact that ANN uses (half) squared error cost function instead of cross entropy 
and no softmax. They are supposed to be better for classification.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4774] Makes HiveFromSpark more portable

2014-12-05 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3628#issuecomment-65879460
  
Looks ok aside from the cleanup issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4774] Makes HiveFromSpark more portable

2014-12-05 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3628#discussion_r21412176
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala 
---
@@ -24,10 +28,16 @@ import org.apache.spark.sql.hive.HiveContext
 object HiveFromSpark {
   case class Record(key: Int, value: String)
 
+  // Copy kv1.txt file from classpath to temporary directory
+  val kv1Stream = HiveFromSpark.getClass.getResourceAsStream("/kv1.txt")
+  val tmpDir = Files.createTempDir()
+  tmpDir.deleteOnExit()
--- End diff --

I don't think this works for non-empty directories. You'd need to 
recursively delete it. Instead of creating a temp directory, I'd just use 
`File.createTempFile()`, unless the name of the file really needs to be 
`kv1.txt`.

Also, the paranoid in me would close all open streams in a finally block, 
but not super important in this code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4774] Makes HiveFromSpark more portable

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3628#issuecomment-65879463
  
  [Test build #24201 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24201/consoleFull)
 for   PR 3628 at commit 
[`fca97ee`](https://github.com/apache/spark/commit/fca97eee9953f4f30284cd86cd70019e8b77a57e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4774] Makes HiveFromSpark more portable

2014-12-05 Thread ksakellis

GitHub user ksakellis opened a pull request:

https://github.com/apache/spark/pull/3628

[SPARK-4774] Makes HiveFromSpark more portable

HiveFromSpark read the kv1.txt file from 
SPARK_HOME/examples/src/main/resources/kv1.txt which assumed
you had a source tree checked out. Now we copy the kv1.txt file to a 
temporary file and delete it when
the jvm shuts down. This allows us to run this example outside of a spark 
source tree.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ksakellis/spark kostas-spark-4774

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3628.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3628


commit fca97eee9953f4f30284cd86cd70019e8b77a57e
Author: Kostas Sakellis 
Date:   2014-12-06T01:13:17Z

[SPARK-4774] Makes HiveFromSpark more portable

HiveFromSpark read the kv1.txt file from
SPARK_HOME/examples/src/main/resources/kv1.txt which assumed
you had a source tree checked out. Now we copy the
kv1.txt file to a temporary file and delete it when
the jvm shutsdown. This allows us to run this example
outside of a spark source tree.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-12-05 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3130#issuecomment-65878693
  
I am also in the camp that wants to use Spark without `spark-submit`. My 
broad experience is that it's quite possible to programmatically configure 
`SparkContext` yourself. The scripts handle a lot of dealing with the 
environment and putting some simplifying flags and args around things. If you 
have just 1-2 specific deployment environments you need, and don't need the 
simplifications, you can configure the context manually and be on your way. 

The "why" is that the app I am interested in does not have Spark as the 
controlling element. Running it via the Spark script is probably possible but 
just harder, as it gets in the way a bit.

But I definitely do not package Spark/Hadoop classes, but put the cluster's 
copy on the classpath at runtime, with my own light script. I suppose you could 
argue that's just recreating some of `spark-submit` but is an alternative and 
works fine.

The issue here, however, seems to be classpath and version conflict, which 
is different and orthogonal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-12-05 Thread mccheah

Github user mccheah commented on the pull request:

https://github.com/apache/spark/pull/3130#issuecomment-65878359
  
Totally makes sense. I don't think I have enough context in the Spark world 
as a whole to suggest a holistic build design, but I agree that this is where 
the disconnect is.

@pwendell what are your thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-12-05 Thread ash211

Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/3130#issuecomment-65878128
  
I think a core disconnect here is that the Spark team thought the majority 
use of Spark in applications would be through the spark-submit script.  But 
Matt and I (and @msgehard and probably others) are trying to connect to Spark 
from an independent application that isn't launched through spark-submit.

I don't know the entirety of the reason our team made this choice, but 
parts of it include having init scripts on the service, controlling logging in 
a unified way with other non-Spark applications, controlling the application 
layout (binaries, logs, pidfiles, scratch space, etc), controlling the 
application lifecycle outside of the scripts, the ability to spin up new 
SparkContexts in a subprocess JVM, don't have to create an uberjar to submit to 
Spark and can instead have an application's jars in a lib/ folder, and I'm sure 
a few others I'm missing right now.

What I think might be most valuable here is to probe more into why teams 
feel the need to connect to Spark by circumventing spark-submit.sh.  It's 
possible that there are use cases that the script doesn't account for, and this 
alternate route is going to continue causing friction and dependency shading 
requests until we figure out a resolution to the issue.

@pwendell and @mccheah does that seem reasonable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65875429
  
  [Test build #24200 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24200/consoleFull)
 for   PR 3625 at commit 
[`0fefabb`](https://github.com/apache/spark/commit/0fefabb9d63dec4ad2175338ee7d42ad622a9536).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4771][Docs] Document standalone cluster...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3627#issuecomment-65874938
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24197/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4771][Docs] Document standalone cluster...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3627#issuecomment-65874932
  
  [Test build #24197 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24197/consoleFull)
 for   PR 3627 at commit 
[`2b55ed2`](https://github.com/apache/spark/commit/2b55ed22848b32c3bbdf39ec0091f621351cdb22).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65874408
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24198/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65874403
  
  [Test build #24198 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24198/consoleFull)
 for   PR 3625 at commit 
[`9076b4a`](https://github.com/apache/spark/commit/9076b4a0f9e648ae5451795c3d024d491accd6e0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] Create multiple concurrent connec...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65874329
  
  [Test build #24199 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24199/consoleFull)
 for   PR 3625 at commit 
[`41dfcb2`](https://github.com/apache/spark/commit/41dfcb260e71ecc5d497d64c80f976f350f59be5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-05 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-65873447
  
The MiMa failure is surprising, since that class was marked as `private` 
and therefore shouldn't have been subject to compatibility checks.  
@ScrapCodes, do you know if this is a MiMa bug?  Do we have a JIRA for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3409#issuecomment-65871555
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24196/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3409#issuecomment-65871549
  
  [Test build #24196 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24196/consoleFull)
 for   PR 3409 at commit 
[`8963552`](https://github.com/apache/spark/commit/8963552ae6ffe14c34e45ff5be562de10154e05b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-65870758
  
  [Test build #24195 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24195/consoleFull)
 for   PR 3570 at commit 
[`b6c2180`](https://github.com/apache/spark/commit/b6c2180f98e2fc35970b0c646c189b376c949255).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-65870763
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24195/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3409#issuecomment-65870513
  
  [Test build #24194 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24194/consoleFull)
 for   PR 3409 at commit 
[`dea1692`](https://github.com/apache/spark/commit/dea169261811c5eef436a43dd6f1a3df68fbde3e).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3409#issuecomment-65870516
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24194/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] [WIP] Create multiple concurrent ...

2014-12-05 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65869325
  
Looks mostly good to me, a few remaining synchronization issues. Will take 
another long look after you address all comments. I'd really appreciate a test, 
though, if we can get one in -- we really don't want to be regressing at this 
point, and we also really want to make sure we're fixing the issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] [WIP] Create multiple concurrent ...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65869095
  
  [Test build #24198 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24198/consoleFull)
 for   PR 3625 at commit 
[`9076b4a`](https://github.com/apache/spark/commit/9076b4a0f9e648ae5451795c3d024d491accd6e0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] [WIP] Create multiple concurrent ...

2014-12-05 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/3625#discussion_r21406868
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ---
@@ -143,43 +184,41 @@ public void initChannel(SocketChannel ch) {
 assert client != null : "Channel future completed successfully with 
null client";
 
 // Execute any client bootstraps synchronously before marking the 
Client as successful.
-long preBootstrap = System.currentTimeMillis();
+long preBootstrap = System.nanoTime();
 logger.debug("Connection to {} successful, running bootstraps...", 
address);
 try {
   for (TransportClientBootstrap clientBootstrap : clientBootstraps) {
 clientBootstrap.doBootstrap(client);
   }
 } catch (Exception e) { // catch non-RuntimeExceptions too as 
bootstrap may be written in Scala
-  long bootstrapTime = System.currentTimeMillis() - preBootstrap;
-  logger.error("Exception while bootstrapping client after " + 
bootstrapTime + " ms", e);
+  long bootstrapTimeMs = (System.nanoTime() - preBootstrap) / 100;
+  logger.error("Exception while bootstrapping client after " + 
bootstrapTimeMs + " ms", e);
   client.close();
   throw Throwables.propagate(e);
 }
-long postBootstrap = System.currentTimeMillis();
-
-// Successful connection & bootstrap -- in the event that two threads 
raced to create a client,
-// use the first one that was put into the connectionPool and close 
the one we made here.
-TransportClient oldClient = connectionPool.putIfAbsent(address, 
client);
-if (oldClient == null) {
-  logger.debug("Successfully created connection to {} after {} ms ({} 
ms spent in bootstraps)",
-address, postBootstrap - preConnect, postBootstrap - preBootstrap);
-  return client;
-} else {
-  logger.debug("Two clients were created concurrently after {} ms, 
second will be disposed.",
-postBootstrap - preConnect);
-  client.close();
-  return oldClient;
-}
+long postBootstrap = System.nanoTime();
+
+logger.debug("Successfully created connection to {} after {} ms ({} ms 
spent in bootstraps)",
+address, (postBootstrap - preConnect) / 100, (postBootstrap - 
preBootstrap) / 100);
+
+return client;
   }
 
   /** Close all connections in the connection pool, and shutdown the 
worker thread pool. */
   @Override
   public void close() {
-for (TransportClient client : connectionPool.values()) {
-  try {
-client.close();
-  } catch (RuntimeException e) {
-logger.warn("Ignoring exception during close", e);
+// Go through all clients and close them if they are active.
+for (ClientPool clientPool : connectionPool.values()) {
+  for (int i = 0; i < clientPool.clients.length; i++) {
+TransportClient client = clientPool.clients[i];
+if (client != null) {
+  clientPool.clients[i] = null;
--- End diff --

need synchronization here too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] [WIP] Create multiple concurrent ...

2014-12-05 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/3625#discussion_r21406852
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ---
@@ -143,43 +184,41 @@ public void initChannel(SocketChannel ch) {
 assert client != null : "Channel future completed successfully with 
null client";
 
 // Execute any client bootstraps synchronously before marking the 
Client as successful.
-long preBootstrap = System.currentTimeMillis();
+long preBootstrap = System.nanoTime();
 logger.debug("Connection to {} successful, running bootstraps...", 
address);
 try {
   for (TransportClientBootstrap clientBootstrap : clientBootstraps) {
 clientBootstrap.doBootstrap(client);
   }
 } catch (Exception e) { // catch non-RuntimeExceptions too as 
bootstrap may be written in Scala
-  long bootstrapTime = System.currentTimeMillis() - preBootstrap;
-  logger.error("Exception while bootstrapping client after " + 
bootstrapTime + " ms", e);
+  long bootstrapTimeMs = (System.nanoTime() - preBootstrap) / 100;
+  logger.error("Exception while bootstrapping client after " + 
bootstrapTimeMs + " ms", e);
   client.close();
   throw Throwables.propagate(e);
 }
-long postBootstrap = System.currentTimeMillis();
-
-// Successful connection & bootstrap -- in the event that two threads 
raced to create a client,
-// use the first one that was put into the connectionPool and close 
the one we made here.
-TransportClient oldClient = connectionPool.putIfAbsent(address, 
client);
-if (oldClient == null) {
-  logger.debug("Successfully created connection to {} after {} ms ({} 
ms spent in bootstraps)",
-address, postBootstrap - preConnect, postBootstrap - preBootstrap);
-  return client;
-} else {
-  logger.debug("Two clients were created concurrently after {} ms, 
second will be disposed.",
-postBootstrap - preConnect);
-  client.close();
-  return oldClient;
-}
+long postBootstrap = System.nanoTime();
+
+logger.debug("Successfully created connection to {} after {} ms ({} ms 
spent in bootstraps)",
+address, (postBootstrap - preConnect) / 100, (postBootstrap - 
preBootstrap) / 100);
--- End diff --

2 spaces for indent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] [WIP] Create multiple concurrent ...

2014-12-05 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/3625#discussion_r21406760
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ---
@@ -97,23 +116,45 @@ public TransportClient createClient(String remoteHost, 
int remotePort) throws IO
 // Get connection from the connection pool first.
 // If it is not found or not active, create a new one.
 final InetSocketAddress address = new InetSocketAddress(remoteHost, 
remotePort);
-TransportClient cachedClient = connectionPool.get(address);
+
+// Create the ClientPool if we don't have it yet.
+ClientPool clientPool = connectionPool.get(address);
+if (clientPool == null) {
+  clientPool = connectionPool.putIfAbsent(address, new ClientPool());
+}
+
+int clientIndex = rand.nextInt(numConnectionsPerPeer);
+TransportClient cachedClient = clientPool.clients[clientIndex];
 if (cachedClient != null) {
   if (cachedClient.isActive()) {
 logger.trace("Returning cached connection to {}: {}", address, 
cachedClient);
 return cachedClient;
   } else {
 logger.info("Found inactive connection to {}, closing it.", 
address);
-connectionPool.remove(address, cachedClient); // Remove inactive 
clients.
+clientPool.clients[clientIndex] = null;  // Remove inactive 
clients.
--- End diff --

Shouldn't this be behind a lock?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] [WIP] Create multiple concurrent ...

2014-12-05 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/3625#discussion_r21406649
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ---
@@ -97,23 +116,45 @@ public TransportClient createClient(String remoteHost, 
int remotePort) throws IO
 // Get connection from the connection pool first.
 // If it is not found or not active, create a new one.
 final InetSocketAddress address = new InetSocketAddress(remoteHost, 
remotePort);
-TransportClient cachedClient = connectionPool.get(address);
+
+// Create the ClientPool if we don't have it yet.
+ClientPool clientPool = connectionPool.get(address);
+if (clientPool == null) {
+  clientPool = connectionPool.putIfAbsent(address, new ClientPool());
--- End diff --

putIfAbsent returns the *previous* value, so this would be null


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4771][Docs] Document standalone cluster...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3627#issuecomment-65867988
  
  [Test build #24197 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24197/consoleFull)
 for   PR 3627 at commit 
[`2b55ed2`](https://github.com/apache/spark/commit/2b55ed22848b32c3bbdf39ec0091f621351cdb22).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4771][Docs] Document standalone cluster...

2014-12-05 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3627#discussion_r21406531
  
--- Diff: docs/spark-standalone.md ---
@@ -257,7 +257,7 @@ To run an interactive Spark shell against the cluster, 
run the following command
 
 You can also pass an option `--total-executor-cores ` to control 
the number of cores that spark-shell uses on the cluster.
 
-# Launching Compiled Spark Applications
+# Launching Spark Applications
--- End diff --

I checked that there are no references to this section


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4771][Docs] Document standalone cluster...

2014-12-05 Thread andrewor14

GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/3627

[SPARK-4771][Docs] Document standalone cluster supervise mode

@tdas looks like streaming already refers to the supervise mode. The link 
from there is broken though.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark document-supervise

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3627.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3627


commit 2b55ed22848b32c3bbdf39ec0091f621351cdb22
Author: Andrew Or 
Date:   2014-12-05T22:47:43Z

Document standalone cluster supervise mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3405] add subnet-id and vpc-id options ...

2014-12-05 Thread evanv

Github user evanv commented on the pull request:

https://github.com/apache/spark/pull/2872#issuecomment-65867098
  
Is VPC support slated for the next maintenance release? Support for VPCs is 
definitely needed for a lot of us, and it'd be great if we didn't have to patch 
it ourselves. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] [WIP] Create multiple concurrent ...

2014-12-05 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/3625#discussion_r21406023
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
 ---
@@ -56,12 +57,28 @@
  * TransportClient, all given {@link TransportClientBootstrap}s will be 
run.
  */
 public class TransportClientFactory implements Closeable {
+
+  /** A simple data structure to track the pool of clients between two 
peer nodes. */
+  private class ClientPool {
+TransportClient[] clients;
+Object[] locks;
+
+public ClientPool() {
+  clients = new TransportClient[numConnectionsPerPeer];
--- End diff --

We can make this a private static class if we make this a constructor 
parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] [WIP] Create multiple concurrent ...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65865847
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24193/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] [WIP] Create multiple concurrent ...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65865836
  
  [Test build #24193 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24193/consoleFull)
 for   PR 3625 at commit 
[`3e1306c`](https://github.com/apache/spark/commit/3e1306cd443fe3f580e44bf947b00223f0beb747).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4770. [DOC] [YARN] spark.scheduler.minRe...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3624#issuecomment-65864818
  
  [Test build #24192 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24192/consoleFull)
 for   PR 3624 at commit 
[`bd81a3a`](https://github.com/apache/spark/commit/bd81a3a9081b61e41ebdaba2c657ff66f80d946f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4770. [DOC] [YARN] spark.scheduler.minRe...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3624#issuecomment-65864826
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24192/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [spark-4691][shuffle]Code improvement for aggr...

2014-12-05 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/3553#issuecomment-65862751
  
Should probably just merge it into master, and thus it's independent of the 
1.2 release, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3409#issuecomment-65862808
  
  [Test build #24196 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24196/consoleFull)
 for   PR 3409 at commit 
[`8963552`](https://github.com/apache/spark/commit/8963552ae6ffe14c34e45ff5be562de10154e05b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4765] Make GC time always shown in UI.

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3622#issuecomment-65862711
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24191/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4765] Make GC time always shown in UI.

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3622#issuecomment-65862705
  
  [Test build #24191 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24191/consoleFull)
 for   PR 3622 at commit 
[`e71d893`](https://github.com/apache/spark/commit/e71d89356e66f8532498652c80da9f1a43b98f39).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [spark-4691][shuffle]Code improvement for aggr...

2014-12-05 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3553#issuecomment-65862084
  
I would use `[SPARK-4691][Minor] Rewrite a few lines in shuffle code`. Not 
a big deal if you don't change this. I'll merge this once we cut the new 1.2 
release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-65861501
  
  [Test build #24195 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24195/consoleFull)
 for   PR 3570 at commit 
[`b6c2180`](https://github.com/apache/spark/commit/b6c2180f98e2fc35970b0c646c189b376c949255).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] SPARK-4362: Added classProbabilities m...

2014-12-05 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3626#discussion_r21403159
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -65,6 +65,24 @@ class NaiveBayesModel private[mllib] (
   override def predict(testData: Vector): Double = {
 labels(brzArgmax(brzPi + brzTheta * testData.toBreeze))
   }
+
+  def classProbabilities(testData: RDD[Vector]):
+   RDD[scala.collection.mutable.Map[Double, Double]] = {
+val bcModel = testData.context.broadcast(this)
+testData.mapPartitions { iter =>
+  val model = bcModel.value
+  iter.map(model.classProbabilities)
+}
+  }
+
+  def classProbabilities(testData: Vector): 
scala.collection.mutable.Map[Double, Double] = {
--- End diff --

That's fine, but you need not promise a mutable Map in the return
type. You can return it as a scala.collection.Map

On Fri, Dec 5, 2014 at 3:51 PM, alanctgardner  
wrote:
> In
> 
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala:
>
>> @@ -65,6 +65,24 @@ class NaiveBayesModel private[mllib] (
>>override def predict(testData: Vector): Double = {
>>  labels(brzArgmax(brzPi + brzTheta * testData.toBreeze))
>>}
>> +
>> +  def classProbabilities(testData: RDD[Vector]):
>> +   RDD[scala.collection.mutable.Map[Double, Double]] = {
>> +val bcModel = testData.context.broadcast(this)
>> +testData.mapPartitions { iter =>
>> +  val model = bcModel.value
>> +  iter.map(model.classProbabilities)
>> +}
>> +  }
>> +
>> +  def classProbabilities(testData: Vector):
>> scala.collection.mutable.Map[Double, Double] = {
>
> Scala newbie. I couldn't find a better pattern to build the map than
> mutating it in the foreach. Should I just build a map then make it 
immutable
> for returning?
>
> â
> Reply to this email directly or view it on GitHub.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] SPARK-4362: Added classProbabilities m...

2014-12-05 Thread alanctgardner

Github user alanctgardner commented on a diff in the pull request:

https://github.com/apache/spark/pull/3626#discussion_r21402907
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -65,6 +65,24 @@ class NaiveBayesModel private[mllib] (
   override def predict(testData: Vector): Double = {
 labels(brzArgmax(brzPi + brzTheta * testData.toBreeze))
   }
+
+  def classProbabilities(testData: RDD[Vector]):
+   RDD[scala.collection.mutable.Map[Double, Double]] = {
+val bcModel = testData.context.broadcast(this)
+testData.mapPartitions { iter =>
+  val model = bcModel.value
+  iter.map(model.classProbabilities)
+}
+  }
+
+  def classProbabilities(testData: Vector): 
scala.collection.mutable.Map[Double, Double] = {
--- End diff --

Scala newbie. I couldn't find a better pattern to build the map than 
mutating it in the foreach. Should I just build a map then make it immutable 
for returning?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] SPARK-4362: Added classProbabilities m...

2014-12-05 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3626#discussion_r21402810
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -65,6 +65,24 @@ class NaiveBayesModel private[mllib] (
   override def predict(testData: Vector): Double = {
 labels(brzArgmax(brzPi + brzTheta * testData.toBreeze))
   }
+
+  def classProbabilities(testData: RDD[Vector]):
+   RDD[scala.collection.mutable.Map[Double, Double]] = {
+val bcModel = testData.context.broadcast(this)
+testData.mapPartitions { iter =>
+  val model = bcModel.value
+  iter.map(model.classProbabilities)
+}
+  }
+
+  def classProbabilities(testData: Vector): 
scala.collection.mutable.Map[Double, Double] = {
+val posteriors = (brzPi + brzTheta * testData.toBreeze) 
--- End diff --

These aren't quite the class probabilities as the expression doesn't have 
the probability of the evidence incorporated. This would work if you first 
normalized the probabilities to sum to 1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] SPARK-4362: Added classProbabilities m...

2014-12-05 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3626#discussion_r21402733
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -65,6 +65,24 @@ class NaiveBayesModel private[mllib] (
   override def predict(testData: Vector): Double = {
 labels(brzArgmax(brzPi + brzTheta * testData.toBreeze))
   }
+
+  def classProbabilities(testData: RDD[Vector]):
+   RDD[scala.collection.mutable.Map[Double, Double]] = {
+val bcModel = testData.context.broadcast(this)
+testData.mapPartitions { iter =>
+  val model = bcModel.value
+  iter.map(model.classProbabilities)
+}
+  }
+
+  def classProbabilities(testData: Vector): 
scala.collection.mutable.Map[Double, Double] = {
--- End diff --

.. but why are you returning mutable `Map` anyway?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] SPARK-4362: Added classProbabilities m...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3626#issuecomment-65859729
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] SPARK-4362: Added classProbabilities m...

2014-12-05 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3626#discussion_r21402613
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -65,6 +65,24 @@ class NaiveBayesModel private[mllib] (
   override def predict(testData: Vector): Double = {
 labels(brzArgmax(brzPi + brzTheta * testData.toBreeze))
   }
+
+  def classProbabilities(testData: RDD[Vector]):
+   RDD[scala.collection.mutable.Map[Double, Double]] = {
--- End diff --

I'd probably import `scala.collection.mutable` and write `mutable.Map`. Not 
sure what others' preference is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] SPARK-4362: Added classProbabilities m...

2014-12-05 Thread alanctgardner

GitHub user alanctgardner opened a pull request:

https://github.com/apache/spark/pull/3626

[MLLIB] SPARK-4362: Added classProbabilities method for Naive Bayes

Added methods which accept an RDD or array and return a map of (label -> 
posterior prob.) for each input set rather than only returning the key with the 
maximum value. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alanctgardner/spark nb-posterior

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3626.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3626


commit da4fddaeaa6a4c72e6024db1df1ff7d1a356ff90
Author: Alan Gardner 
Date:   2014-12-05T21:10:31Z

Added classProbabilities method




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Streaming doc : do you mean inadvertently?

2014-12-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3620


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Streaming doc : do you mean inadvertently?

2014-12-05 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3620#issuecomment-65859058
  
Merging in master & branch-1.2. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4767: Add support for launching in a spe...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3623#issuecomment-65858480
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24190/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4767: Add support for launching in a spe...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3623#issuecomment-65858470
  
  [Test build #24190 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24190/consoleFull)
 for   PR 3623 at commit 
[`70ace25`](https://github.com/apache/spark/commit/70ace25cf260d1b968f631a2adc0cfb8aeeffe08).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3409#issuecomment-65858141
  
  [Test build #24194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24194/consoleFull)
 for   PR 3409 at commit 
[`dea1692`](https://github.com/apache/spark/commit/dea169261811c5eef436a43dd6f1a3df68fbde3e).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] [WIP] Create multiple concurrent ...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3625#issuecomment-65857520
  
  [Test build #24193 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24193/consoleFull)
 for   PR 3625 at commit 
[`3e1306c`](https://github.com/apache/spark/commit/3e1306cd443fe3f580e44bf947b00223f0beb747).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4740] [WIP] Create multiple concurrent ...

2014-12-05 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/3625

[SPARK-4740] [WIP] Create multiple concurrent connections between two peer 
nodes in Netty.

Need to test & add test cases.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-4740

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3625.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3625


commit 4f216736024f48ed9fa3fca6ea5953495def8e51
Author: Reynold Xin 
Date:   2014-12-05T21:20:58Z

[SPARK-4740] Create multiple concurrent connections between two peer nodes 
in Netty.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Clear local copies of accumulators as soon as ...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-65855799
  
  [Test build #24189 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24189/consoleFull)
 for   PR 3570 at commit 
[`537baad`](https://github.com/apache/spark/commit/537baad0379644537f21385f0cc1150b4af0b237).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Clear local copies of accumulators as soon as ...

2014-12-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-65855809
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24189/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4770. [DOC] [YARN] spark.scheduler.minRe...

2014-12-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3624#issuecomment-65854478
  
  [Test build #24192 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24192/consoleFull)
 for   PR 3624 at commit 
[`bd81a3a`](https://github.com/apache/spark/commit/bd81a3a9081b61e41ebdaba2c657ff66f80d946f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4770. [DOC] [YARN] spark.scheduler.minRe...

2014-12-05 Thread sryza

GitHub user sryza opened a pull request:

https://github.com/apache/spark/pull/3624

SPARK-4770. [DOC] [YARN] spark.scheduler.minRegisteredResourcesRatio doc...

...umented default is incorrect for YARN

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sryza/spark sandy-spark-4770

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3624.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3624


commit bd81a3a9081b61e41ebdaba2c657ff66f80d946f
Author: Sandy Ryza 
Date:   2014-12-05T20:54:37Z

SPARK-4770. [DOC] [YARN] spark.scheduler.minRegisteredResourcesRatio 
documented default is incorrect for YARN




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 196 matches

Mail list logo