date:20150302

[GitHub] spark pull request: [SPARK-6097][MLLIB] Support tree model save/lo...

2015-03-02 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/4854#discussion_r25637981
  
--- Diff: docs/mllib-decision-tree.md ---
@@ -317,6 +315,10 @@ testErr = labelsAndPredictions.filter(lambda (v, p): v 
!= p).count() / float(tes
 print('Test Error = ' + str(testErr))
 print('Learned classification tree model:')
 print(model.toDebugString())
+
+# Save and load model
+model.save(sc, myModelPath)
+sameModel = DecisionTreeModel.load(sc, myModelPath)
--- End diff --

Missing quotation mark (here and in other examples)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6040][SQL] Fix the percent bug in table...

2015-03-02 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4789#discussion_r25639164
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 ---
@@ -467,6 +467,7 @@ class HiveQuerySuite extends HiveComparisonTest with 
BeforeAndAfter {
 
   test(sampling) {
 sql(SELECT * FROM src TABLESAMPLE(0.1 PERCENT) s)
+sql(SELECT * FROM src TABLESAMPLE(100 PERCENT) s)
--- End diff --

I'm going to go ahead and merge this since it changes semantics and we are 
close to the release where we remove the alpha tag, but it would be great if 
you could add a test that actually checks to make sure sampling is happening 
and we are getting something close to the expected number of results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6118] making package name of deploy.wor...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4856#issuecomment-76828944
  
  [Test build #28188 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28188/consoleFull)
 for   PR 4856 at commit 
[`cb93700`](https://github.com/apache/spark/commit/cb937009fcef8aa6ddb632e8cc2ef1cc16069a31).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-03-02 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4821#discussion_r25641491
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 ---
@@ -89,16 +105,18 @@ class FsHistoryProviderSuite extends FunSuite with 
BeforeAndAfter with Matchers
 
 val list = provider.getListing().toSeq
 list should not be (null)
-list.size should be (4)
-list.count(e = e.completed) should be (2)
+list.size should be (5)
+list.count(_.completed) should be (3)
 
 list(0) should be (ApplicationHistoryInfo(newAppComplete.getName(), 
new-app-complete, 1L, 4L,
   newAppComplete.lastModified(), test, true))
-list(1) should be (ApplicationHistoryInfo(oldAppComplete.getName(), 
old-app-complete, 2L, 3L,
+list(1) should be 
(ApplicationHistoryInfo(newAppCompressedComplete.getName(),
--- End diff --

I think your problem is that entires `1` and `2` have both the same start 
and end time, so their sort order is non-deterministic. Also, 
`newAppCompressedComplete` has end time `4` in the code a few lines above. So 
it seems your test code has a couple of issues, this is not really flakiness.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/4857#discussion_r25642017
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala
 ---
@@ -106,40 +109,54 @@ class NettyBlockTransferService(conf: SparkConf, 
securityManager: SecurityManage
   blockId: BlockId,
   blockData: ManagedBuffer,
   level: StorageLevel): Future[Unit] = {
-val result = Promise[Unit]()
 val client = clientFactory.createClient(hostname, port)
 
 // StorageLevel is serialized as bytes using our JavaSerializer. 
Everything else is encoded
 // using our binary protocol.
 val levelBytes = serializer.newInstance().serialize(level).array()
 
-// Convert or copy nio buffer into array in order to serialize it.
-val nioBuffer = blockData.nioByteBuffer()
-val array = if (nioBuffer.hasArray) {
-  nioBuffer.array()
-} else {
-  val data = new Array[Byte](nioBuffer.remaining())
-  nioBuffer.get(data)
-  data
-}
-
-client.sendRpc(new UploadBlock(appId, execId, blockId.toString, 
levelBytes, array).toByteArray,
-  new RpcResponseCallback {
-override def onSuccess(response: Array[Byte]): Unit = {
-  logTrace(sSuccessfully uploaded block $blockId)
-  result.success()
-}
-override def onFailure(e: Throwable): Unit = {
-  logError(sError while uploading block $blockId, e)
-  result.failure(e)
+val largeByteBuffer = blockData.nioByteBuffer()
+val bufferParts = largeByteBuffer.nioBuffers().asScala
+val chunkOffsets: Seq[Long] = bufferParts.scanLeft(0l){case(offset, 
buf) = offset + buf.limit()}
+
+import scala.concurrent.ExecutionContext.Implicits.global
+bufferParts.zipWithIndex.foldLeft(Future.successful(())){case 
(prevFuture,(buf,idx)) =
--- End diff --

this is the sending side for replication of blocks.  pretty simple on this 
end -- just break it up into msgs  2gb.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6077] update listener for the existing ...

2015-03-02 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4828#issuecomment-76834346
  
@tdas do you think we should add a Selenium test for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4857#issuecomment-76839123
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28190/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6118] making package name of deploy.wor...

2015-03-02 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4856#issuecomment-76839939
  
LGTM. I'll wait until tomorrow morning to merge just in case there are 
other comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4857#issuecomment-76839949
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28191/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4857#issuecomment-76841496
  
  [Test build #28192 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28192/consoleFull)
 for   PR 4857 at commit 
[`17d0c1a`](https://github.com/apache/spark/commit/17d0c1a122e082e7263398fadaa75f2200544f04).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4826#issuecomment-76841684
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28187/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5390 [DOCS] Encourage users to post on S...

2015-03-02 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4843#discussion_r25646670
  
--- Diff: docs/index.md ---
@@ -115,6 +115,8 @@ options for deployment:
 
 * [Spark Homepage](http://spark.apache.org)
 * [Spark Wiki](https://cwiki.apache.org/confluence/display/SPARK)
+* [Spark Community](http://spark.apache.org/community.html) resources, 
including local meetups
--- End diff --

I didn't reproduce that because mailing lists are already mentioned 2 
bullets down


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-03-02 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4821#discussion_r25647732
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 ---
@@ -89,16 +105,18 @@ class FsHistoryProviderSuite extends FunSuite with 
BeforeAndAfter with Matchers
 
 val list = provider.getListing().toSeq
 list should not be (null)
-list.size should be (4)
-list.count(e = e.completed) should be (2)
+list.size should be (5)
+list.count(_.completed) should be (3)
 
 list(0) should be (ApplicationHistoryInfo(newAppComplete.getName(), 
new-app-complete, 1L, 4L,
   newAppComplete.lastModified(), test, true))
-list(1) should be (ApplicationHistoryInfo(oldAppComplete.getName(), 
old-app-complete, 2L, 3L,
+list(1) should be 
(ApplicationHistoryInfo(newAppCompressedComplete.getName(),
--- End diff --

however that still doesn't explain why it passes locally but fails remotely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-03-02 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4821#discussion_r25647694
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 ---
@@ -89,16 +105,18 @@ class FsHistoryProviderSuite extends FunSuite with 
BeforeAndAfter with Matchers
 
 val list = provider.getListing().toSeq
 list should not be (null)
-list.size should be (4)
-list.count(e = e.completed) should be (2)
+list.size should be (5)
+list.count(_.completed) should be (3)
 
 list(0) should be (ApplicationHistoryInfo(newAppComplete.getName(), 
new-app-complete, 1L, 4L,
   newAppComplete.lastModified(), test, true))
-list(1) should be (ApplicationHistoryInfo(oldAppComplete.getName(), 
old-app-complete, 2L, 3L,
+list(1) should be 
(ApplicationHistoryInfo(newAppCompressedComplete.getName(),
--- End diff --

yeah, I just realized this independently. I fixed this in my latest commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-03-02 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4821#issuecomment-76848440
  
LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4011] tighten the visibility of the mem...

2015-03-02 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4844#issuecomment-76849828
  
Oh my, I didn't realize this would expand to change 50 files. In many cases 
you're removing visibility restrictions. I understand that the class-level 
visibility still constrains it but it's part of why the change is large now.

I don't feel qualified to judge whether this is too much change or not. It 
seems like the original much smaller commit was certainly a good change, at the 
least. I agree that it does feel right to lock down visibility to improve 
reasoning about the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-03-02 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4525#discussion_r25649771
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -189,48 +201,69 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
   false
   }
 }
-.flatMap { entry =
-  try {
-Some(replay(entry, new ReplayListenerBus()))
-  } catch {
-case e: Exception =
-  logError(sFailed to load application log data from 
$entry., e)
-  None
-  }
-}
-.sortWith(compareAppInfo)
+.flatMap { entry = Some(entry) }
+.sortWith { case (entry1, entry2) =
+  val mod1 = getModificationTime(entry1).getOrElse(-1L)
+  val mod2 = getModificationTime(entry2).getOrElse(-1L)
+  mod1 = mod2
+  }
--- End diff --

indent. I will fix when I merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix doc typo for describing primitiveT...

2015-03-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4762


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix doc typo for describing primitiveT...

2015-03-02 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/4762#issuecomment-76824082
  
Merged to master and 1.3


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6077] update listener for the existing ...

2015-03-02 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/4828#issuecomment-76836876
  
Will that be stable? Not flaky? In the past we had simple web ui tests that
uses Scala's Source class to fetch a URL to see whether a tab has been
loaded or unloaded. Those were disabled because of flakiness. I wonder
whether selenium tests will be more stable.

On Mon, Mar 2, 2015 at 1:55 PM, Josh Rosen notificati...@github.com wrote:

 @tdas https://github.com/tdas do you think we should add a Selenium
 test for this?

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/4828#issuecomment-76834346.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6048] SparkConf should not translate de...

2015-03-02 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4799#discussion_r25643250
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -324,14 +330,13 @@ private[spark] class Executor(
 val classUri = conf.get(spark.repl.class.uri, null)
 if (classUri != null) {
   logInfo(Using REPL class URI:  + classUri)
-  val userClassPathFirst: java.lang.Boolean =
-conf.getBoolean(spark.executor.userClassPathFirst, false)
   try {
+val _userClassPathFirst: java.lang.Boolean = userClassPathFirst
--- End diff --

Do you need this? Feels like just referencing `userClassPathFirst` should 
work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6050] [yarn] Relax matching of vcore co...

2015-03-02 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/4818#issuecomment-76842747
  
this looks good to me. +1. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6114][SQL] Avoid metastore conversions ...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4855#issuecomment-76821852
  
  [Test build #28184 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28184/consoleFull)
 for   PR 4855 at commit 
[`a712249`](https://github.com/apache/spark/commit/a712249d3617f0a4a4eba6eb759dcd2770aeec12).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6097][MLLIB] Support tree model save/lo...

2015-03-02 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/4854#discussion_r25638795
  
--- Diff: python/pyspark/mllib/tests.py ---
@@ -19,7 +19,9 @@
 Fuller unit tests for Python MLlib.
 
 
+import os
 import sys
+impprt tempfile
--- End diff --

typo


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5390 [DOCS] Encourage users to post on S...

2015-03-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4843


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Refactored Dataframe join comment to use corre...

2015-03-02 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/4847#issuecomment-76823596
  
Thanks!  Merged to master and 1.3 with a small change for java.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-03-02 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4826#discussion_r25639687
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala ---
@@ -115,4 +115,84 @@ class DataTypeSuite extends FunSuite {
   checkDefaultSize(MapType(IntegerType, StringType, true), 41)
   checkDefaultSize(MapType(IntegerType, ArrayType(DoubleType), false), 
80400)
   checkDefaultSize(structType, 812)
+
+  def checkEqualsIgnoreCompatibleNullability(
+  from: DataType,
+  to: DataType,
+  expected: Boolean): Unit = {
+val testName =
+  sequalsIgnoreCompatibleNullability: (from: ${from}, to: ${to})
+test(testName) {
+  assert(DataType.equalsIgnoreCompatibleNullability(from, to) === 
expected)
+}
+  }
+
+  checkEqualsIgnoreCompatibleNullability(
+from = ArrayType(DoubleType, containsNull = true),
+to = ArrayType(DoubleType, containsNull = true),
+expected = true)
+  checkEqualsIgnoreCompatibleNullability(
+from = ArrayType(DoubleType, containsNull = false),
+to = ArrayType(DoubleType, containsNull = false),
+expected = true)
+  checkEqualsIgnoreCompatibleNullability(
+from = ArrayType(DoubleType, containsNull = false),
+to = ArrayType(DoubleType, containsNull = true),
+expected = true)
+  checkEqualsIgnoreCompatibleNullability(
+from = ArrayType(DoubleType, containsNull = true),
+to = ArrayType(DoubleType, containsNull = false),
+expected = false)
+  checkEqualsIgnoreCompatibleNullability(
+from = ArrayType(DoubleType, containsNull = false),
+to = ArrayType(StringType, containsNull = false),
+expected = false)
+
+  checkEqualsIgnoreCompatibleNullability(
+from = MapType(StringType, DoubleType, valueContainsNull = true),
+to = MapType(StringType, DoubleType, valueContainsNull = true),
+expected = true)
+  checkEqualsIgnoreCompatibleNullability(
+from = MapType(StringType, DoubleType, valueContainsNull = false),
+to = MapType(StringType, DoubleType, valueContainsNull = false),
+expected = true)
+  checkEqualsIgnoreCompatibleNullability(
+from = MapType(StringType, DoubleType, valueContainsNull = false),
+to = MapType(StringType, DoubleType, valueContainsNull = true),
+expected = true)
+  checkEqualsIgnoreCompatibleNullability(
+from = MapType(StringType, DoubleType, valueContainsNull = true),
+to = MapType(StringType, DoubleType, valueContainsNull = false),
+expected = false)
+  checkEqualsIgnoreCompatibleNullability(
+from = MapType(StringType, ArrayType(IntegerType, true), 
valueContainsNull = true),
+to = MapType(StringType,  ArrayType(IntegerType, false), 
valueContainsNull = true),
+expected = false)
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6097][MLLIB] Support tree model save/lo...

2015-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4854#issuecomment-76838144
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28185/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6097][MLLIB] Support tree model save/lo...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4854#issuecomment-76838133
  
  [Test build #28185 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28185/consoleFull)
 for   PR 4854 at commit 
[`8ebcac2`](https://github.com/apache/spark/commit/8ebcac284133a62b985473d3bce23616dfdede95).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, 
JavaLoader):`
  * `class TreeEnsembleModel(JavaModelWrapper, JavaSaveable):`
  * `class DecisionTreeModel(JavaModelWrapper, JavaSaveable, JavaLoader):`
  * `class RandomForestModel(TreeEnsembleModel, JavaLoader):`
  * `class GradientBoostedTreesModel(TreeEnsembleModel, JavaLoader):`
  * `class Saveable(object):`
  * `class JavaSaveable(Saveable):`
  * `class Loader(object):`
  * `class JavaLoader(Loader):`
  * `java_class = cls._java_loader_class()`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4857#issuecomment-76837934
  
  [Test build #28190 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28190/consoleFull)
 for   PR 4857 at commit 
[`9de8866`](https://github.com/apache/spark/commit/9de88665fca9462f6ef6a395f0ad60a0275d1bd0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4011] tighten the visibility of the mem...

2015-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4844#issuecomment-76840849
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28186/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6077] update listener for the existing ...

2015-03-02 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4828#issuecomment-76840648
  
Poorly-written Selenium tests can be flaky if they don't account for things 
like asynchrony (e.g. when testing Javascript interactions), but I don't think 
that will be a problem here.  We now have a bunch of Selenium tests for the 
Spark core UI and I don't think I've ever seen them fail in Jenkins: 
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4857#issuecomment-76843225
  
  [Test build #28194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28194/consoleFull)
 for   PR 4857 at commit 
[`cd84c69`](https://github.com/apache/spark/commit/cd84c6996c5fd48f2f9f2855e7f3e7a75ba5865a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6050] [yarn] Relax matching of vcore co...

2015-03-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4818


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3357 [CORE] Internal log messages should...

2015-03-02 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/4838#discussion_r25647202
  
--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -184,7 +184,7 @@ private[spark] class MemoryStore(blockManager: 
BlockManager, maxMemory: Long)
   val entry = entries.remove(blockId)
   if (entry != null) {
 currentMemory -= entry.size
-logInfo(sBlock $blockId of size ${entry.size} dropped from memory 
(free $freeMemory))
+logDebug(sBlock $blockId of size ${entry.size} dropped from 
memory (free $freeMemory))
--- End diff --

I believe we do have INFO level logging for this up the call chain when 
drops are blocked due to cache contention:


https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1004

Might be nice to augment that logging to have information on the size and 
limit (like this does).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4821#issuecomment-76846594
  
  [Test build #28198 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28198/consoleFull)
 for   PR 4821 at commit 
[`fdae14c`](https://github.com/apache/spark/commit/fdae14c86f660047b50020b40a782b842182336c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-03-02 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4821#discussion_r25647830
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 ---
@@ -89,16 +105,18 @@ class FsHistoryProviderSuite extends FunSuite with 
BeforeAndAfter with Matchers
 
 val list = provider.getListing().toSeq
 list should not be (null)
-list.size should be (4)
-list.count(e = e.completed) should be (2)
+list.size should be (5)
+list.count(_.completed) should be (3)
 
 list(0) should be (ApplicationHistoryInfo(newAppComplete.getName(), 
new-app-complete, 1L, 4L,
   newAppComplete.lastModified(), test, true))
-list(1) should be (ApplicationHistoryInfo(oldAppComplete.getName(), 
old-app-complete, 2L, 3L,
+list(1) should be 
(ApplicationHistoryInfo(newAppCompressedComplete.getName(),
--- End diff --

It could be that you're using a different filesystem than then jenkins 
machine, and both return things in different order.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1655][MLLIB] WIP Add option for distrib...

2015-03-02 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2491#issuecomment-76848066
  
@staple is this still an active PR? just trying to figure out if it's stale 
and can be closed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6097][MLLIB] Support tree model save/lo...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4854#issuecomment-76822795
  
  [Test build #28185 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28185/consoleFull)
 for   PR 4854 at commit 
[`8ebcac2`](https://github.com/apache/spark/commit/8ebcac284133a62b985473d3bce23616dfdede95).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4011] tighten the visibility of the mem...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4844#issuecomment-76822798
  
  [Test build #28186 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28186/consoleFull)
 for   PR 4844 at commit 
[`d5d0e1c`](https://github.com/apache/spark/commit/d5d0e1cbd4d632f0bab444f4ef64df5a63bf4b41).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6040][SQL] Fix the percent bug in table...

2015-03-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4789


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6029] Stop excluding fastutil package

2015-03-02 Thread jkleckner

Github user jkleckner commented on the pull request:

https://github.com/apache/spark/pull/4780#issuecomment-76831730
  
 I wonder if we're hitting some bugs with the serializer's classloader not 
seeing the right classloaders. Some issues like that are fixed in 1.3.0. It may 
still be that this is OK in 1.3, but that's just a guess right now.
 
 Remind me where the error occurs? is it within a stack trace that 
includes the serializer?

Perhaps.  Here is a bit of test code that I run before really setting up 
the RDDs to provoke the failure.
The dump suggests it is in the constructor though.
```scala
  val qDigest = new QDigest(256.0)
  val s2 =  qDigest:  + qDigest.toString()
  println(s2)

Exception in thread main java.lang.NoClassDefFoundError: 
it/unimi/dsi/fastutil/longs/Long2LongOpenHashMap
at 
com.clearspring.analytics.stream.quantile.QDigest.init(QDigest.java:79)
at com.*
at com.*
at com.*
at com.*
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: java.lang.ClassNotFoundException: 
it.unimi.dsi.fastutil.longs.Long2LongOpenHashMap
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 12 more
```

 Spark doesn't build with ```minimizeJar```. I was referring to the 
parquet-column package. Spark should not, philosophically, be depended upon to 
provide anything but Spark (well, and anything third party that's necessary to 
invoke it). Indeed a lot of issues here descend from the fact that things 
aren't shaded and conflict.

Ah yes, parquet-column was the one with minimizeJar, not Spark.

 classpath-first is supposed to be a mechanism to work around this no 
matter if the conflict came from elsewhere. And if it isn't, that needs to be 
fixed ideally, as a first priority.

Makes sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4821#issuecomment-76831455
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28182/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/4857#discussion_r25641946
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/netty/NettyBlockRpcServer.scala ---
@@ -63,11 +83,83 @@ class NettyBlockRpcServer(
 // StorageLevel is serialized as bytes using our JavaSerializer.
 val level: StorageLevel =
   
serializer.newInstance().deserialize(ByteBuffer.wrap(uploadBlock.metadata))
-val data = new 
NioManagedBuffer(ByteBuffer.wrap(uploadBlock.blockData))
+val data = new 
NioManagedBuffer(LargeByteBufferHelper.asLargeByteBuffer(uploadBlock.blockData))
+logTrace(putting block into our block manager:  + blockManager)
 blockManager.putBlockData(BlockId(uploadBlock.blockId), data, 
level)
 responseContext.onSuccess(new Array[Byte](0))
+
+  case uploadPartialBock: UploadPartialBlock =
+logTrace(received upload partial block:  + uploadPartialBock)
--- End diff --

This is a key component of block replication for blocks  2gb.  Definitely 
need feedback on how robust this approach is, if the handling of dropped 
messages is sane, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/4857#discussion_r25642290
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/nio/NioBlockTransferService.scala 
---
@@ -143,7 +143,7 @@ final class NioBlockTransferService(conf: SparkConf, 
securityManager: SecurityMa
   level: StorageLevel)
 : Future[Unit] = {
 checkInit()
-val msg = PutBlock(blockId, blockData.nioByteBuffer(), level)
+val msg = PutBlock(blockId, 
blockData.nioByteBuffer().firstByteBuffer(), level)
--- End diff --

`NioBlockTransferService` is totally broken with these changes, this is 
just to make it compile.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6048] SparkConf should not translate de...

2015-03-02 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4799#issuecomment-76836507
  
LGTM, left just minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4857#issuecomment-76838816
  
  [Test build #28191 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28191/consoleFull)
 for   PR 4857 at commit 
[`ef88085`](https://github.com/apache/spark/commit/ef8808555c2caddefbe5809e4c194bcc0675f40b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3357 [CORE] Internal log messages should...

2015-03-02 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/4838#discussion_r25646752
  
--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -184,7 +184,7 @@ private[spark] class MemoryStore(blockManager: 
BlockManager, maxMemory: Long)
   val entry = entries.remove(blockId)
   if (entry != null) {
 currentMemory -= entry.size
-logInfo(sBlock $blockId of size ${entry.size} dropped from memory 
(free $freeMemory))
+logDebug(sBlock $blockId of size ${entry.size} dropped from 
memory (free $freeMemory))
--- End diff --

On this one - do you know if this already gets logged somewhere else if a 
block is dropped from memory due to contention? It would be good to make sure 
there is some INFO level logging when a block is dropped due to memory being 
exceeded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4826#issuecomment-76847366
  
  [Test build #28199 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28199/consoleFull)
 for   PR 4826 at commit 
[`80e487e`](https://github.com/apache/spark/commit/80e487ea629c56864bf023c0e7431e8bc7b9f0b1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-03-02 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4826#discussion_r25639644
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/dataTypes.scala ---
@@ -198,6 +198,43 @@ object DataType {
   case (left, right) = left == right
 }
   }
+
+  /**
+   * Compares two types, ignoring compatible nullability of ArrayType, 
MapType, StructType.
+   *
+   * Compatible nullability is defined as follows:
+   *   - If `from` and `to` are ArrayTypes, `from` has a compatible 
nullability with `to`
+   *   if and only if `to.containsNull` is true, or both of 
`from.containsNull` and
+   *   `to.containsNull` are false.
+   *   - If `from` and `to` are MapTypes, `from` has a compatible 
nullability with `to`
+   *   if and only if `to.valueContainsNull` is true, or both of 
`from.valueContainsNull` and
+   *   `to.valueContainsNull` are false.
+   *   - If `from` and `to` are StructTypes, `from` has a compatible 
nullability with `to`
+   *   if and only if for all every pair of fields, `to.nullable` is true, 
or both
+   *   of `fromField.nullable` and `toField.nullable` are false.
+   */
+  private[sql] def equalsIgnoreCompatibleNullability(from: DataType, to: 
DataType): Boolean = {
--- End diff --

We can introduce a method to the class of `DataType` based on this one 
later (I am not sure what will be a good name. I thought about 
`compatibleWith`, but I feel it is not very accurate). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4011] tighten the visibility of the mem...

2015-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4844#issuecomment-76833095
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28181/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4857#issuecomment-76833223
  
  [Test build #28189 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28189/consoleFull)
 for   PR 4857 at commit 
[`6f6a8d7`](https://github.com/apache/spark/commit/6f6a8d7c512ab66ee8f03fa725d97533d0672c8e).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class LargeByteBufferOutputStream(chunkSize: Int = 65536)`
  * `public class LargeByteBufferHelper `
  * `public class WrappedLargeByteBuffer implements LargeByteBuffer `
  * `public class UploadPartialBlock extends BlockTransferMessage `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4857#issuecomment-76833028
  
  [Test build #28189 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28189/consoleFull)
 for   PR 4857 at commit 
[`6f6a8d7`](https://github.com/apache/spark/commit/6f6a8d7c512ab66ee8f03fa725d97533d0672c8e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4011] tighten the visibility of the mem...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4844#issuecomment-76840838
  
  [Test build #28186 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28186/consoleFull)
 for   PR 4844 at commit 
[`d5d0e1c`](https://github.com/apache/spark/commit/d5d0e1cbd4d632f0bab444f4ef64df5a63bf4b41).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class MavenCoordinate(groupId: String, artifactId: String, 
version: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6118] making package name of deploy.wor...

2015-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4856#issuecomment-76842877
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28188/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6118] making package name of deploy.wor...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4856#issuecomment-76842862
  
  [Test build #28188 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28188/consoleFull)
 for   PR 4856 at commit 
[`cb93700`](https://github.com/apache/spark/commit/cb937009fcef8aa6ddb632e8cc2ef1cc16069a31).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5390 [DOCS] Encourage users to post on S...

2015-03-02 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/4843#discussion_r25646936
  
--- Diff: docs/index.md ---
@@ -115,6 +115,8 @@ options for deployment:
 
 * [Spark Homepage](http://spark.apache.org)
 * [Spark Wiki](https://cwiki.apache.org/confluence/display/SPARK)
+* [Spark Community](http://spark.apache.org/community.html) resources, 
including local meetups
--- End diff --

Ah I see - sounds good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6097][MLLIB] Support tree model save/lo...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4854#issuecomment-76846592
  
  [Test build #28196 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28196/consoleFull)
 for   PR 4854 at commit 
[`4586a4d`](https://github.com/apache/spark/commit/4586a4d02aee4acb8af73315093ceae009325a38).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6121][SQL][MLLIB] simpleString for UDT

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4858#issuecomment-76846620
  
  [Test build #28197 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28197/consoleFull)
 for   PR 4858 at commit 
[`34f0a77`](https://github.com/apache/spark/commit/34f0a77b76e9828a1d6356525ca3969d37f28ffa).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6048] SparkConf should not translate de...

2015-03-02 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4799#discussion_r25649473
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -324,14 +330,13 @@ private[spark] class Executor(
 val classUri = conf.get(spark.repl.class.uri, null)
 if (classUri != null) {
   logInfo(Using REPL class URI:  + classUri)
-  val userClassPathFirst: java.lang.Boolean =
-conf.getBoolean(spark.executor.userClassPathFirst, false)
   try {
+val _userClassPathFirst: java.lang.Boolean = userClassPathFirst
--- End diff --

Maybe not, I'm just preserving what was already there before. This is later 
used in some reflection code so I think it would be safest to leave it as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-03-02 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4821#discussion_r25649521
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -574,6 +583,11 @@ private[spark] object JsonProtocol {
 SparkListenerExecutorRemoved(time, executorId, reason)
   }
 
+  def logStartFromJson(json: JValue): SparkListenerLogStart = {
+val version = (json \ Spark Version).extract[String]
--- End diff --

(edit): misunderstood what you meant. I'll rename the variable NBD


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-03-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4525


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Refactored Dataframe join comment to use corre...

2015-03-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4847


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4826#issuecomment-76827466
  
  [Test build #28187 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28187/consoleFull)
 for   PR 4826 at commit 
[`0cb7ea2`](https://github.com/apache/spark/commit/0cb7ea27185db716ae5deddff00649064ddde860).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4821#issuecomment-76831439
  
  [Test build #28182 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28182/consoleFull)
 for   PR 4821 at commit 
[`654883d`](https://github.com/apache/spark/commit/654883dfbd65de162455601c69108dab4e354f7d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4857#issuecomment-76833228
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28189/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6048] SparkConf should not translate de...

2015-03-02 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4799#discussion_r25643128
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -343,6 +343,13 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging {
 }
   }
 }
+
+// Warn against the use of deprecated configs
+deprecatedConfigs.values.foreach { dc =
--- End diff --

Since you're doing this only once during the app's lifetime (this method is 
only called from SparkContext AFAICT), you could simplify the code that tracks 
whether warns have been printed in `DeprecatedConfig`. But ok to not do it if 
you want to limit the scope of the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5494][SQL] SparkSqlSerializer Ignores K...

2015-03-02 Thread mccheah

Github user mccheah commented on the pull request:

https://github.com/apache/spark/pull/4693#issuecomment-76840419
  
Talked with @pwoody offline and he's working on a test case, but I'll 
summarize what I think will break. If you use an SQL Context wrapping a Spark 
Context where you set spark.serializer, then when you do some things in Spark 
SQL and try to collect the RDD, the operations in computing the SQL will be 
fine, but when you collect the RDD it uses the SQL serializer to serialize the 
results but the driver will kryo-deserialize using spark.serializer.

This was the issue that originally prompted this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5494][SQL] SparkSqlSerializer Ignores K...

2015-03-02 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/4693#issuecomment-76842442
  
@mccheah we [set the serializer in SQL on a per-shuffle 
basis](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala#L72),
 so that would surprise me.  However, if you can show it happening we should 
certainly fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4826#issuecomment-76842454
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28193/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4826#issuecomment-76842451
  
  [Test build #28193 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28193/consoleFull)
 for   PR 4826 at commit 
[`587d88b`](https://github.com/apache/spark/commit/587d88b4ec438a38c6e4db55c9de4c654fd78210).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6121][SQL][MLLIB] simpleString for UDT

2015-03-02 Thread mengxr

GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/4858

[SPARK-6121][SQL][MLLIB] simpleString for UDT

`df.dtypes` shows `null` for UDTs. This PR uses `udt` by default and 
`VectorUDT` overwrites it with `vector`.

@jkbradley @davies

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark SPARK-6121

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4858.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4858


commit 34f0a77b76e9828a1d6356525ca3969d37f28ffa
Author: Xiangrui Meng m...@databricks.com
Date:   2015-03-02T22:57:00Z

simpleString for UDT




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-03-02 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/4821#discussion_r25648384
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -574,6 +583,11 @@ private[spark] object JsonProtocol {
 SparkListenerExecutorRemoved(time, executorId, reason)
   }
 
+  def logStartFromJson(json: JValue): SparkListenerLogStart = {
+val version = (json \ Spark Version).extract[String]
--- End diff --

minor - but maybe call this `sparkVersion` so it's clear that this isn't a 
version for the logging format (since that doesn't have its own versioning).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4821#issuecomment-76850938
  
  [Test build #28201 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28201/consoleFull)
 for   PR 4821 at commit 
[`8511141`](https://github.com/apache/spark/commit/8511141f6b769655acc2b0c1de33a8401c8f7133).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-03-02 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/4525#issuecomment-76850970
  
Ok I'm merging this in master thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6097][MLLIB] Support tree model save/lo...

2015-03-02 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/4854#issuecomment-76824296
  
LGTM other than that typo.  I'm trying to compile  test now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4826#issuecomment-76842310
  
  [Test build #28193 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28193/consoleFull)
 for   PR 4826 at commit 
[`587d88b`](https://github.com/apache/spark/commit/587d88b4ec438a38c6e4db55c9de4c654fd78210).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Tighten up field/method visibility in Executor...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4850#issuecomment-76845031
  
  [Test build #28195 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28195/consoleFull)
 for   PR 4850 at commit 
[`dcba289`](https://github.com/apache/spark/commit/dcba289c0e95d93e79b3af27aebef8fa604078db).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6114][SQL] Avoid metastore conversions ...

2015-03-02 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/4855#issuecomment-76845758
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6121][SQL][MLLIB] simpleString for UDT

2015-03-02 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/4858#issuecomment-76848785
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread squito

GitHub user squito opened a pull request:

https://github.com/apache/spark/pull/4857

[wip][SPARK-1391][SPARK-3151] 2g partition limit

https://issues.apache.org/jira/browse/SPARK-1391

This is still really rough, I'm looking for some feedback on overall 
design, its not ready to merge.  I put a design doc on jira; I think the major 
issues that I'd like feedback on are:

1. How to test this?  I added tests, but I disabled some of them just so 
that I don't destroy jenkins.  Some of these tests need ~16GB of memory to run. 
 I have smaller test cases for some things, but I really think we need some 
tests that actually transfer a block that is  2GB.  
[SPARK-4767](https://github.com/apache/spark/pull/4048) would help with this.  
Also looking for suggestions on more tests.

2. I could really use some advice on how to make `NettyBlockRpcServer` 
robust in the way it handles `UploadPartialBlock`.  (a) is the use of timeouts 
sensible?  (b) how do I come up with reasonable timeouts?  (c) other cases I'm 
not thinking about for how it might fail?

3. How to test performance?  I haven't tested performance at all so far.  
Again, my goals are only maintaining performance on  2GB blocks, we can figure 
out how to improve the performance of 2GB blocks later.  (though of course easy 
fixes now are welcome.)  I'll try to do some performance testing myself but 
could use advice.

thanks to @sryza , @ryanlecompte and @harishreedharan for helping bounce 
ideas around, and @mridulm for work on an earlier implementation that served as 
inspiration.  mistakes are all mine though :)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/squito/spark SPARK-1391_2g_partition_limit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4857.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4857


commit 5cdcd4246e586346a8e1ac2242dd795fdb1ae068
Author: Imran Rashid iras...@cloudera.com
Date:   2015-02-20T22:35:12Z

add some failing tests, though these probably shouldnt actually get merged

commit 03db862833f3c4feef2d72620bc5c9a893dab2f5
Author: Imran Rashid iras...@cloudera.com
Date:   2015-02-23T20:28:22Z

steal some code from earlier work of @mridulm

commit d6337f03a4ac2971a004ef821281723e857f9008
Author: Imran Rashid iras...@cloudera.com
Date:   2015-02-24T18:18:38Z

wip -- changed a bunch of types to LargeByteBuffer;  discovered problem on 
replicate()

commit a139e97fe1aeac279b9c47119745c0f45eb7d8c5
Author: Imran Rashid iras...@cloudera.com
Date:   2015-02-25T19:27:13Z

compiling but all sorts of bad casting etc.

commit 4965bad00574a05e133e7caeba56cd6115fe35b6
Author: Imran Rashid iras...@cloudera.com
Date:   2015-02-25T20:28:16Z

move LargeByteBuffer to network-common, since we need it there for the 
shuffles

commit 149d4fa3fa55403df90109b440a3523d3f4ab92b
Author: Imran Rashid iras...@cloudera.com
Date:   2015-02-25T21:50:33Z

move large byte buffer to network/common ... still lots of crud

commit 01cafbf15026fdcbfd58566335802082493a491c
Author: Imran Rashid iras...@cloudera.com
Date:   2015-02-25T21:53:22Z

tests compile too

commit ce391a0dbbba3d169d4013d2e387b7808065b3f8
Author: Imran Rashid iras...@cloudera.com
Date:   2015-02-25T22:00:00Z

failing test case (though its crappy)

commit 29f0a8a10c685ea2742d239a748bc6c5d7798380
Author: Imran Rashid iras...@cloudera.com
Date:   2015-02-27T19:19:12Z

fix use of LargeByteBuffer in some tests, create UploadPartialBlock

commit dcb46697d59fa77ac643e438b346eb28972d9e8f
Author: Imran Rashid iras...@cloudera.com
Date:   2015-02-27T20:59:04Z

add real test case for uploading large blocks (failing now)

commit 660f5e362439d79d8dfd000a805be0ad5181106c
Author: Imran Rashid iras...@cloudera.com
Date:   2015-02-28T02:57:32Z

flesh out NettyBlockTransfer#uploadBlock

commit 4c228a07173e06f8da449db17b878d220e14dea0
Author: Imran Rashid iras...@cloudera.com
Date:   2015-02-28T03:14:13Z

minor cleanup

commit cf7c3a7067aaa61732782995984f17fa94a6cff7
Author: Imran Rashid iras...@cloudera.com
Date:   2015-02-28T04:45:50Z

cleanup abandonded block uploads

commit fe90fd682d71ce4c156dde9e6a016e7923e65aad
Author: Imran Rashid iras...@cloudera.com
Date:   2015-03-02T15:46:19Z

crank up memory for tests

commit 857f3dfae649d56dbd37a022ddbe594c2a9bd0ac
Author: Imran Rashid iras...@cloudera.com
Date:   2015-03-02T15:47:16Z

fix LargeByteBuffer dispose()

commit b700723033d49094a4c0ee6f43a23592b29ae01f
Author: Imran Rashid iras...@cloudera.com
Date:   2015-03-02T16:55:19Z

maven needs you to request lots of extra memory for some reason

commit 6b102a028df0fae8d60b5cb2174fc4f380a81627
Author: Imran Rashid iras...@cloudera.com
Date:   2015-03-02T19:17:16Z

passing

[GitHub] spark pull request: [SPARK-4011] tighten the visibility of the mem...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4844#issuecomment-76833078
  
  [Test build #28181 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28181/consoleFull)
 for   PR 4844 at commit 
[`535065d`](https://github.com/apache/spark/commit/535065deb7760cb9ecbe3e8b33cf5a28cedcbb21).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class MavenCoordinate(groupId: String, artifactId: String, 
version: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4857#issuecomment-76839944
  
  [Test build #28191 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28191/consoleFull)
 for   PR 4857 at commit 
[`ef88085`](https://github.com/apache/spark/commit/ef8808555c2caddefbe5809e4c194bcc0675f40b).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class LargeByteBufferOutputStream(chunkSize: Int = 65536)`
  * `public class LargeByteBufferHelper `
  * `public class WrappedLargeByteBuffer implements LargeByteBuffer `
  * `public class UploadPartialBlock extends BlockTransferMessage `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/4857#discussion_r25644794
  
--- Diff: 
core/src/test/scala/org/apache/spark/network/netty/NettyBlockTransferSuite.scala
 ---
@@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.network.netty
+
+import java.nio.ByteBuffer
+import java.util.concurrent.TimeUnit
+
+import org.apache.commons.io.IOUtils
+import org.apache.spark.network.BlockDataManager
+import org.apache.spark.network.buffer._
+import org.apache.spark.network.shuffle.BlockFetchingListener
+import org.apache.spark.storage.{BlockId, StorageLevel, RDDBlockId, 
ShuffleBlockId}
+import org.apache.spark.{Logging, SecurityManager, SparkConf}
+import org.mockito.ArgumentCaptor
+import org.mockito.{Matchers = MockitoMatchers}
+import org.mockito.Mockito._
+import org.scalatest.mock.MockitoSugar
+import org.scalatest.{Matchers, FunSuite}
+
+import scala.concurrent.duration.{Duration, FiniteDuration}
+import scala.concurrent.{Await, Promise}
+
+class NettyBlockTransferSuite extends FunSuite with Matchers with 
MockitoSugar with Logging {
+
+  val conf = new SparkConf()
+.set(spark.app.id, app-id)
+  val securityManager = new SecurityManager(conf)
+
+
+  def fetchBlock(buf: LargeByteBuffer): ManagedBuffer = {
+val blockManager = mock[BlockDataManager]
+val blockId = ShuffleBlockId(0, 1, 2)
+val blockBuffer = new NioManagedBuffer(buf)
+when(blockManager.getBlockData(blockId)).thenReturn(blockBuffer)
+
+val from = new NettyBlockTransferService(conf, securityManager, 
numCores = 1)
+from.init(blockManager)
+val to = new NettyBlockTransferService(conf, securityManager, numCores 
= 1)
+to.init(blockManager)
+
+try {
+  val promise = Promise[ManagedBuffer]()
+
+  to.fetchBlocks(from.hostName, from.port, 1, 
Array(blockId.toString),
+new BlockFetchingListener {
+  override def onBlockFetchFailure(blockId: String, exception: 
Throwable): Unit = {
+promise.failure(exception)
+  }
+
+  override def onBlockFetchSuccess(blockId: String, data: 
ManagedBuffer): Unit = {
+promise.success(data.retain())
+  }
+})
+
+  Await.ready(promise.future, FiniteDuration(100, TimeUnit.SECONDS))
+  promise.future.value.get.get
+} finally {
+  from.close()
+  to.close()
+}
+
+  }
+
+  ignore(simple fetch) {
+val blockString = Hello, world!
+val blockBuffer = 
LargeByteBufferHelper.asLargeByteBuffer(blockString.getBytes)
+val fetched = fetchBlock(blockBuffer)
+
+IOUtils.toString(fetched.createInputStream()) should equal(blockString)
+  }
+
+
+  def uploadBlock(buf: LargeByteBuffer, rddId: Int, timeout: Long) {
+
+val fromBlockManager = mock[BlockDataManager]
+val toBlockManager = mock[BlockDataManager]
+val blockId = RDDBlockId(rddId, rddId + 1)
+val blockBuffer = new NioManagedBuffer(buf)
+val level = StorageLevel.DISK_ONLY //doesn't matter
+
+val from = new NettyBlockTransferService(conf, securityManager, 
numCores = 1)
+from.init(fromBlockManager)
+val to = new NettyBlockTransferService(conf, securityManager, numCores 
= 1)
+to.init(toBlockManager)
+
+try {
+  val uploadFuture = from.uploadBlock(to.hostName, to.port, exec-1, 
blockId, blockBuffer, level)
+  Await.result(uploadFuture, Duration.apply(timeout, 
TimeUnit.MILLISECONDS))
+  val bufferCaptor = ArgumentCaptor.forClass(classOf[ManagedBuffer])
+  verify(toBlockManager).putBlockData(MockitoMatchers.eq(blockId), 
bufferCaptor.capture(),
+MockitoMatchers.eq(level))
+  val putBuffer = bufferCaptor.getValue()
+  logTrace(begin checking buffer equivalence)
+

[GitHub] spark pull request: SPARK-5390 [DOCS] Encourage users to post on S...

2015-03-02 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/4843#discussion_r25646565
  
--- Diff: docs/index.md ---
@@ -115,6 +115,8 @@ options for deployment:
 
 * [Spark Homepage](http://spark.apache.org)
 * [Spark Wiki](https://cwiki.apache.org/confluence/display/SPARK)
+* [Spark Community](http://spark.apache.org/community.html) resources, 
including local meetups
--- End diff --

minor - but could this say including mailing lists and local meetups? I 
think it would be good to still mention the mailing list here, but moving this 
dropped that text.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4011] tighten the visibility of the mem...

2015-03-02 Thread CodingCat

Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/4844#issuecomment-76847322
  
Hi, @srowen and @JoshRosen 

I made further changes. Basically, I got 2 kinds of strategies to tighten 
the accessibility 

1. modify individual variables/methods 

2. change the accessibility of the classes directly

One more thing to mention is that 

I didn't apply the most restrict permission to all places, because I guess 
in some of the places, the original authors intentionally made more loose 
accessibility for easier extension in future, e.g.  in 
https://github.com/apache/spark/pull/4844/files#diff-829a8674171f92acd61007bedb1bfa4fR40
 (DriverRunner), some of the public variables declared in the constructor is 
not necessary (e.g. workerUrl), but some of the public variables declared 
together is read in other classes (e.g. driverId is read in WorkerPage), in 
such cases, I didn't limit workerUrl as private, since in future we might add 
more rich content to WorkerPage



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-03-02 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4821#discussion_r25648636
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -574,6 +583,11 @@ private[spark] object JsonProtocol {
 SparkListenerExecutorRemoved(time, executorId, reason)
   }
 
+  def logStartFromJson(json: JValue): SparkListenerLogStart = {
+val version = (json \ Spark Version).extract[String]
--- End diff --

but then it's not consistent with other similar JSONs...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76849555
  
  [Test build #28200 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28200/consoleFull)
 for   PR 3916 at commit 
[`28cd35e`](https://github.com/apache/spark/commit/28cd35eb7d7c065f679ef4749e599ad4d31ee5cf).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6048] SparkConf should not translate de...

2015-03-02 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4799#discussion_r25649108
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -343,6 +343,13 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging {
 }
   }
 }
+
+// Warn against the use of deprecated configs
+deprecatedConfigs.values.foreach { dc =
--- End diff --

yeah let's revamp this whole thing after the release


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6118] making package name of deploy.wor...

2015-03-02 Thread CodingCat

GitHub user CodingCat opened a pull request:

https://github.com/apache/spark/pull/4856

[SPARK-6118] making package name of deploy.worker.CommandUtils and 
deploy.CommandUtilsSuite consistent

https://issues.apache.org/jira/browse/SPARK-6118

I found that the object CommandUtils is placed under deploy.worker package, 
while CommandUtilsSuite is  under deploy

Conventionally, we put the implementation and unit test class under the 
same package

here, to minimize the change, I move CommandUtilsSuite to worker package, 

**However, CommandUtils seems to contain some general methods (though only 
used by worker.* classes currently**,  we may also consider to replace 
CommonUtils  

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/CodingCat/spark SPARK-6118

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4856.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4856


commit cb937009fcef8aa6ddb632e8cc2ef1cc16069a31
Author: CodingCat zhunans...@gmail.com
Date:   2015-03-02T21:24:50Z

making package name consistent




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6077] update listener for the existing ...

2015-03-02 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/4828#issuecomment-76829945
  
Hey this is a decent fix, but I think this is not the right one. With this 
fix, the new ssc will be reflected in the new streaming tab, however it will be 
still visible even after the earlier ssc has been stopped. The right solution 
is that the tab should be removed when a stream ing context is stopped. Since 
only one streaming context can be active on the same spark context at the same 
time, attach-on-start-and-remove-on-stop will fix the multiple tab problem in 
the right way. 

Does that make sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4857#issuecomment-76839119
  
  [Test build #28190 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28190/consoleFull)
 for   PR 4857 at commit 
[`9de8866`](https://github.com/apache/spark/commit/9de88665fca9462f6ef6a395f0ad60a0275d1bd0).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class LargeByteBufferOutputStream(chunkSize: Int = 65536)`
  * `public class LargeByteBufferHelper `
  * `public class WrappedLargeByteBuffer implements LargeByteBuffer `
  * `public class UploadPartialBlock extends BlockTransferMessage `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6114][SQL] Avoid metastore conversions ...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4855#issuecomment-76839492
  
  [Test build #28184 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28184/consoleFull)
 for   PR 4855 at commit 
[`a712249`](https://github.com/apache/spark/commit/a712249d3617f0a4a4eba6eb759dcd2770aeec12).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [wip][SPARK-1391][SPARK-3151] 2g partition lim...

2015-03-02 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/4857#discussion_r25644640
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/buffer/NioManagedBuffer.java
 ---
@@ -41,13 +41,13 @@ public long size() {
   }
 
   @Override
-  public ByteBuffer nioByteBuffer() throws IOException {
+  public LargeByteBuffer nioByteBuffer() throws IOException {
 return buf.duplicate();
   }
 
   @Override
   public InputStream createInputStream() throws IOException {
-return new ByteBufInputStream(Unpooled.wrappedBuffer(buf));
+return new 
ByteBufInputStream(Unpooled.wrappedBuffer(buf.firstByteBuffer()));
--- End diff --

we could wrap the entire `LargeByteBuffer` in a 
`LargeByteBufferInputStream` -- but I don't understand why there is already 
this level of indirection here so I thought I'd check.  This code is only for 
fetching shuffle blocks in any case, so other things will break if there is 
over 2gb in any case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6114][SQL] Avoid metastore conversions ...

2015-03-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4855#issuecomment-76839508
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28184/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-03-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4826#issuecomment-76841671
  
  [Test build #28187 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28187/consoleFull)
 for   PR 4826 at commit 
[`0cb7ea2`](https://github.com/apache/spark/commit/0cb7ea27185db716ae5deddff00649064ddde860).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5641] [EC2] Allow spark_ec2.py to copy ...

2015-03-02 Thread florianverhein

Github user florianverhein commented on the pull request:

https://github.com/apache/spark/pull/4583#issuecomment-76851833
  
Hi @shivaram, have you had a chance to look at this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6124] Support jdbc connection propertie...

2015-03-02 Thread vlyubin

GitHub user vlyubin opened a pull request:

https://github.com/apache/spark/pull/4859

[SPARK-6124] Support jdbc connection properties in OPTIONS part of the query

One more thing if this PR is considered to be OK - it might make sense to 
add extra .jdbc() API's that take Properties to SQLContext.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vlyubin/spark jdbcProperties

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4859.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4859


commit 5bed5f32ae0853d4de8a9fe10eb8bbf1e87f2862
Author: Volodymyr Lyubinets vlyu...@gmail.com
Date:   2015-03-02T23:38:41Z

Support jdbc connection properties in OPTIONS part of the query




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4705:[ For Cluster mode ] Pull request f...

2015-03-02 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4845#discussion_r25652368
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala ---
@@ -113,6 +129,36 @@ private[spark] class HistoryPage(parent: 
HistoryServer) extends WebUIPage() {
   /div
 UIUtils.basicSparkPage(content, History Server)
   }
+  
+  private def getApplicationLevelList (appNattemptList: 
Iterable[ApplicationHistoryInfo])  ={
+// Create HashMap as per the multiple attempts for one application. 
--- End diff --

Feels like this should be in a scaladoc comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4705:[ For Cluster mode ] Pull request f...

2015-03-02 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4845#discussion_r25652327
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala ---
@@ -34,18 +37,31 @@ private[spark] class HistoryPage(parent: HistoryServer) 
extends WebUIPage() {
 val requestedIncomplete =
   
Option(request.getParameter(showIncomplete)).getOrElse(false).toBoolean
 
-val allApps = parent.getApplicationList().filter(_.completed != 
requestedIncomplete)
-val actualFirst = if (requestedFirst  allApps.size) requestedFirst 
else 0
-val apps = allApps.slice(actualFirst, Math.min(actualFirst + pageSize, 
allApps.size))
-
+val allCompletedAppsNAttempts = 
--- End diff --

`AppsAndAttempts`? No need to sacrifice readability for two characters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 479 matches

Mail list logo