[GitHub] spark pull request: [SPARK-1201] Do not fully materialize partitio...

2014-06-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1083#discussion_r13779778
  
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -46,79 +46,27 @@ private[spark] class CacheManager(blockManager: 
BlockManager) extends Logging {
 new InterruptibleIterator(context, 
values.asInstanceOf[Iterator[T]])
 
   case None =>
-// Mark the split as loading (unless someone else marks it first)
-loading.synchronized {
-  if (loading.contains(key)) {
-logInfo(s"Another thread is loading $key, waiting for it to 
finish...")
-while (loading.contains(key)) {
-  try {
-loading.wait()
-  } catch {
-case e: Exception =>
-  logWarning(s"Got an exception while waiting for another 
thread to load $key", e)
-  }
-}
-logInfo(s"Finished waiting for $key")
-/* See whether someone else has successfully loaded it. The 
main way this would fail
- * is for the RDD-level cache eviction policy if someone else 
has loaded the same RDD
- * partition but we didn't want to make space for it. However, 
that case is unlikely
- * because it's unlikely that two threads would work on the 
same RDD partition. One
- * downside of the current code is that threads wait serially 
if this does happen. */
-blockManager.get(key) match {
-  case Some(values) =>
-return new InterruptibleIterator(context, 
values.asInstanceOf[Iterator[T]])
-  case None =>
-logInfo(s"Whoever was loading $key failed; we'll try it 
ourselves")
-loading.add(key)
-}
-  } else {
-loading.add(key)
-  }
+// Acquire a lock for loading this partition
+// If another thread already holds the lock, wait for it to finish 
return its results
+acquireLockForPartition(key).foreach { values =>
--- End diff --

Maybe better to avoid the foreach closure here because return within a 
closure is implemented as a try-catch, and it is very error prone in the future 
when we wrap this whole block with a try-catch that catches general Exception 
... It has happened multiple times in Spark already, and those problems are 
really hard to find/debug. Best to just avoid it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1201] Do not fully materialize partitio...

2014-06-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1083#discussion_r13779776
  
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -128,4 +76,89 @@ private[spark] class CacheManager(blockManager: 
BlockManager) extends Logging {
 }
 }
   }
+
+  /**
+   * Acquire a loading lock for the partition identified by the given 
block ID.
+   *
+   * If the lock is free, just acquire it and return None. Otherwise, 
another thread is already
+   * loading the partition, so we wait for it to finish and return the 
values loaded by the thread.
+   */
+  private def acquireLockForPartition(id: RDDBlockId): 
Option[Iterator[Any]] = {
+loading.synchronized {
+  if (!loading.contains(id)) {
+// If the partition is free, acquire its lock and begin computing 
its value
+loading.add(id)
+None
+  } else {
+// Otherwise, wait for another thread to finish and return its 
result
+logInfo(s"Another thread is loading $id, waiting for it to 
finish...")
+while (loading.contains(id)) {
+  try {
+loading.wait()
+  } catch {
+case e: Exception =>
+  logWarning(s"Exception while waiting for another thread to 
load $id", e)
+  }
+}
+logInfo(s"Finished waiting for $id")
+/* See whether someone else has successfully loaded it. The main 
way this would fail
+ * is for the RDD-level cache eviction policy if someone else has 
loaded the same RDD
+ * partition but we didn't want to make space for it. However, 
that case is unlikely
+ * because it's unlikely that two threads would work on the same 
RDD partition. One
+ * downside of the current code is that threads wait serially if 
this does happen. */
+val values = blockManager.get(id)
+if (!values.isDefined) {
+  logInfo(s"Whoever was loading $id failed; we'll try it 
ourselves")
+  loading.add(id)
+}
+values
+  }
+}
+  }
+
+  /**
+   * Cache the values of a partition, keeping track of any updates in the 
storage statuses
+   * of other blocks along the way.
+   */
+  private def cacheValues[T](
--- End diff --

instead of cacheValues, how about storeInBlockmanager?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1201] Do not fully materialize partitio...

2014-06-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1083#discussion_r13779775
  
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -128,4 +76,89 @@ private[spark] class CacheManager(blockManager: 
BlockManager) extends Logging {
 }
 }
   }
+
+  /**
+   * Acquire a loading lock for the partition identified by the given 
block ID.
+   *
+   * If the lock is free, just acquire it and return None. Otherwise, 
another thread is already
+   * loading the partition, so we wait for it to finish and return the 
values loaded by the thread.
+   */
+  private def acquireLockForPartition(id: RDDBlockId): 
Option[Iterator[Any]] = {
+loading.synchronized {
+  if (!loading.contains(id)) {
+// If the partition is free, acquire its lock and begin computing 
its value
+loading.add(id)
+None
+  } else {
+// Otherwise, wait for another thread to finish and return its 
result
+logInfo(s"Another thread is loading $id, waiting for it to 
finish...")
+while (loading.contains(id)) {
+  try {
+loading.wait()
+  } catch {
+case e: Exception =>
+  logWarning(s"Exception while waiting for another thread to 
load $id", e)
+  }
+}
+logInfo(s"Finished waiting for $id")
+/* See whether someone else has successfully loaded it. The main 
way this would fail
+ * is for the RDD-level cache eviction policy if someone else has 
loaded the same RDD
+ * partition but we didn't want to make space for it. However, 
that case is unlikely
+ * because it's unlikely that two threads would work on the same 
RDD partition. One
--- End diff --

This paragraph doesn't make a lot of sense to me. In general it is just 
unlikely for two threads to work on the same rdd partition. However, if we ever 
pass the first if (where it returns None), it already means we are in the 
"unlikely" case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1201] Do not fully materialize partitio...

2014-06-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1083#discussion_r13779767
  
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -128,4 +76,89 @@ private[spark] class CacheManager(blockManager: 
BlockManager) extends Logging {
 }
 }
   }
+
+  /**
+   * Acquire a loading lock for the partition identified by the given 
block ID.
+   *
+   * If the lock is free, just acquire it and return None. Otherwise, 
another thread is already
+   * loading the partition, so we wait for it to finish and return the 
values loaded by the thread.
+   */
+  private def acquireLockForPartition(id: RDDBlockId): 
Option[Iterator[Any]] = {
+loading.synchronized {
+  if (!loading.contains(id)) {
+// If the partition is free, acquire its lock and begin computing 
its value
+loading.add(id)
+None
+  } else {
+// Otherwise, wait for another thread to finish and return its 
result
+logInfo(s"Another thread is loading $id, waiting for it to 
finish...")
+while (loading.contains(id)) {
+  try {
+loading.wait()
+  } catch {
+case e: Exception =>
+  logWarning(s"Exception while waiting for another thread to 
load $id", e)
+  }
+}
+logInfo(s"Finished waiting for $id")
+/* See whether someone else has successfully loaded it. The main 
way this would fail
+ * is for the RDD-level cache eviction policy if someone else has 
loaded the same RDD
+ * partition but we didn't want to make space for it. However, 
that case is unlikely
+ * because it's unlikely that two threads would work on the same 
RDD partition. One
+ * downside of the current code is that threads wait serially if 
this does happen. */
+val values = blockManager.get(id)
+if (!values.isDefined) {
+  logInfo(s"Whoever was loading $id failed; we'll try it 
ourselves")
+  loading.add(id)
+}
+values
+  }
+}
+  }
+
+  /**
+   * Cache the values of a partition, keeping track of any updates in the 
storage statuses
+   * of other blocks along the way.
+   */
+  private def cacheValues[T](
+  key: BlockId,
+  value: Iterator[T],
+  storageLevel: StorageLevel,
+  updatedBlocks: ArrayBuffer[(BlockId, BlockStatus)]): Iterator[T] = {
+
+if (!storageLevel.useMemory) {
+  /* This RDD is not to be cached in memory, so we can just pass the 
computed values
+   * as an iterator directly to the BlockManager, rather than first 
fully unrolling
+   * it in memory. The latter option potentially uses much more memory 
and risks OOM
+   * exceptions that can be avoided. */
+  assume(storageLevel.useDisk || storageLevel.useOffHeap, s"Empty 
storage level for $key!")
--- End diff --

Might make sense to remove this assume; in case we add a new storage level 
in the future, this won't hold any more and because this code is so far away 
from the storage level code, we will likely forget to update this location. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1082#issuecomment-46079949
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-2137][SQL] Timestamp UDFs broken

2014-06-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1081


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1082#issuecomment-46079951
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-2137][SQL] Timestamp UDFs broken

2014-06-13 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1081#issuecomment-46079937
  
Ok I'm merging this in master & branch-1.0. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-06-13 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1082#issuecomment-46079903
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1201] Do not fully materialize partitio...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1083#issuecomment-46078657
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15783/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1201] Do not fully materialize partitio...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1083#issuecomment-46078656
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1201] Do not fully materialize partitio...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1083#issuecomment-46078013
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1201] Do not fully materialize partitio...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1083#issuecomment-46078010
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Support transforming TreeNodes with Opti...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1074#issuecomment-46077981
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Support transforming TreeNodes with Opti...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1074#issuecomment-46077982
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15779/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1201] Do not fully materialize partitio...

2014-06-13 Thread andrewor14
GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/1083

[SPARK-1201] Do not fully materialize partitions for 
StorageLevel.MEMORY_*_SER

The deserialized version of a partition may occupy much more space than the 
serialized version. Therefore, if a partition is to be cached with 
`StorageLevel.MEMORY_*_SER`, we don't need to fully unroll it into an 
`ArrayBuffer`, but instead we can unroll it into a potentially much smaller 
`ByteBuffer`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark unroll-them-partitions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1083.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1083


commit a8f181d6483b509c29900de5f325a01ea0ef824f
Author: Andrew Or 
Date:   2014-06-14T03:49:18Z

Add special handling for StorageLevel.MEMORY_*_SER

We only unroll the serialized form of each partition for this case,
because the deserialized form may be much larger and may not fit in
memory.

This commit also abstracts out part of the logic of getOrCompute to
make it more readable.

commit 2941c89baacacfc7573cde35a694bc18a7f5fd4f
Author: Andrew Or 
Date:   2014-06-14T03:52:31Z

Clean up BlockStore (minor)

commit 44ef28246ad4f8116155b0db4969898cc09e5e5e
Author: Andrew Or 
Date:   2014-06-14T03:53:25Z

Actually return updated blocks in putBytes

Previously we never returned the updated blocks in MemoryStore's
putBytes. This is a simple bug with a simple fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-2137][SQL] Timestamp UDFs broken

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1081#issuecomment-46077782
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15778/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-2137][SQL] Timestamp UDFs broken

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1081#issuecomment-46077781
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076891
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15782/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076888
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076890
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076885
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076874
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076862
  
@pwendell, any idea what is going on here?  I keep getting:

```
Building remotely on Ubuntu 14.04 on EC2 6 (ec2) in workspace 
/home/ubuntu/workspace/SparkPullRequestBuilder
java.io.IOException: remote file operation failed: 
/home/ubuntu/workspace/SparkPullRequestBuilder at 
hudson.remoting.Channel@7291f33e:Ubuntu 14.04 on EC2 6
at hudson.FilePath.act(FilePath.java:916)
at hudson.FilePath.act(FilePath.java:893)
at org.jenkinsci.plugins.gitclient.Git.getClient(Git.java:66)
at hudson.plugins.git.GitSCM.createClient(GitSCM.java:566)
at hudson.plugins.git.GitSCM.createClient(GitSCM.java:558)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:874)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1252)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:615)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:524)
at hudson.model.Run.execute(Run.java:1710)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:234)
Caused by: java.io.IOException: Remote call on Ubuntu 14.04 on EC2 6 failed
at hudson.remoting.Channel.call(Channel.java:748)
at hudson.FilePath.act(FilePath.java:909)
... 13 more
Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
com.sun.proxy.$Proxy8
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.reflect.Proxy.newInstance(Proxy.java:748)
at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:739)
at 
hudson.remoting.RemoteInvocationHandler.wrap(RemoteInvocationHandler.java:100)
at hudson.remoting.Channel.export(Channel.java:584)
at hudson.remoting.Channel.export(Channel.java:553)
at 
org.jenkinsci.plugins.gitclient.LegacyCompatibleGitAPIImpl.writeReplace(LegacyCompatibleGitAPIImpl.java:161)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1075)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at hudson.remoting.UserRequest._serialize(UserRequest.java:155)
at hudson.remoting.UserRequest.serialize(UserRequest.java:164)
at hudson.remoting.UserRequest.perform(UserRequest.java:126)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:328)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Recording test results
Finished: FAILURE
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076618
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076613
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076616
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076619
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15781/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076522
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076494
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076499
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15780/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Support transforming TreeNodes with Opti...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1074#issuecomment-46076493
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076489
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Support transforming TreeNodes with Opti...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1074#issuecomment-46076488
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076498
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Support transforming TreeNodes with Opti...

2014-06-13 Thread concretevitamin
Github user concretevitamin commented on the pull request:

https://github.com/apache/spark/pull/1074#issuecomment-46076441
  
Ah, this is tricky. Good to know and thanks for the fix.

On Friday, June 13, 2014, Michael Armbrust  wrote:

> This test is failing because you made the TreeNode an inner class. An
> inner class is actually just syntactic sugar done by the compiler where
> there is an extra implicit constructor parameter to the class for the 
outer
> object. As a result, the contract that all of the argument to the 
TreeNode's
> constructor are either in productIterator or otherCopyArgs is broken.
>
> Here's a fix: concretevitamin#1
> 
>
> —
> Reply to this email directly or view it on GitHub
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SparkSQL] allow UDF on struct

2014-06-13 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/796#issuecomment-46076365
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2053][SQL] Add Catalyst expressions for...

2014-06-13 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1055#issuecomment-46076336
  
Yeah, the python problems should be fixed now though.  I think the problem 
is that this PR doesn't merge cleanly anymore so you aren't picking up the 
python fixes done by @pwendell.  You can tell the merge failed because Jenkins 
said "Build started." instead of "Merge build started".

Please rebase :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Support transforming TreeNodes with Opti...

2014-06-13 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1074#issuecomment-46076251
  
This test is failing because you made the `TreeNode` an inner class.  An 
inner class is actually just syntactic sugar done by the compiler where there 
is an extra implicit constructor parameter to the class for the outer object.  
As a result, the contract that all of the argument to the `TreeNode`'s 
constructor are either in `productIterator` or `otherCopyArgs` is broken.

Here's a fix: https://github.com/concretevitamin/spark/pull/1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-2137][SQL] Timestamp UDFs broken

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1081#issuecomment-46076032
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-2137][SQL] Timestamp UDFs broken

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1081#issuecomment-46076030
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-2137][SQL] Timestamp UDFs broken

2014-06-13 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1081#issuecomment-46075987
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2051]In yarn.ClientBase spark.yarn.dist...

2014-06-13 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/969#issuecomment-46074277
  
 My point is that, cluster and client mode should be consistent. 
`spark.yarn.dist.*` only works in client mode is not perfect.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-06-13 Thread kanzhang
Github user kanzhang commented on the pull request:

https://github.com/apache/spark/pull/1082#issuecomment-46074033
  
@pwendell anything I need to do to fix the test failure?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2051]In yarn.ClientBase spark.yarn.dist...

2014-06-13 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/969#discussion_r13777641
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala 
---
@@ -45,6 +44,12 @@ class ClientArguments(val args: Array[String], val 
sparkConf: SparkConf) {
 
   parseArgs(args.toList)
 
+  files = 
Option(files).getOrElse(sparkConf.getOption("spark.yarn.dist.files").orNull)
+  files = Option(files).map(p => Utils.resolveURIs(p)).orNull
--- End diff --

  files = 
Option(files).getOrElse(sparkConf.getOption("spark.yarn.dist.files").
map(p => Utils.resolveURIs(p)).orNull)
As you say the code should look like this
```scala
  files = 
Option(files).getOrElse(sparkConf.getOption("spark.yarn.dist.files").
map(p => Utils.resolveURIs(p)).orNull)

  archives = 
Option(archives).getOrElse(sparkConf.getOption("spark.yarn.dist.archives").
map(p => Utils.resolveURIs(p)).orNull)
``` 
`spark.yarn.dist.*`  and `--archives/--files` behavior is different, this 
is a bit strange.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-2137][SQL] Timestamp UDFs broken

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1081#issuecomment-46073387
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15776/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1082#issuecomment-46073386
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15777/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-2137][SQL] Timestamp UDFs broken

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1081#issuecomment-46073385
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1082#issuecomment-46073384
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL and ScalaTest in branch-...

2014-06-13 Thread marmbrus
Github user marmbrus closed the pull request at:

https://github.com/apache/spark/pull/1078


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-06-13 Thread kanzhang
Github user kanzhang commented on the pull request:

https://github.com/apache/spark/pull/1082#issuecomment-46072447
  
This patch may require some careful review as it is adding new public APIs. 
The rationale is the following.

Users asked for scala API ```sc.getPersistentRDDs()``` to be added to 
Python. However, the scala method returns a map from RDD id to RDD itself. 
Without knowing python serializer used, it is hard going from underlying Java 
RDD back to Python RDD. For Java API, it is possible to return the right Java 
RDD type by figuring out the element type, but still require some work. 
Instead, I choose to return the set of RDD ids, which could be used to 
unpersist them if so desired. That leads to ```sc.unpersistRDD```, which is 
private[spark] in Scala and I now expose it as public in Java and Python, since 
I need a way to unpersist by only knowing the id. I can imagine it would be 
safer to hide the ids and let users call ```RDD.unpersist```. However, RDD ids 
are already exposed publicly and we just need to remind users ids are 
per-SparkContext.

Let me know if the above makes sense. Thx.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1082#issuecomment-46071657
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1082#issuecomment-46071653
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL and ScalaTest in branch-...

2014-06-13 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/1078#issuecomment-46071589
  
FYI Bumping all the way to the current scalatest 2.2.0 also works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-06-13 Thread kanzhang
GitHub user kanzhang opened a pull request:

https://github.com/apache/spark/pull/1082

[SPARK-2141] Adding getPersistentRddIds and unpersistRDD to Java and Pyt...

...hon API

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kanzhang/spark SPARK-2141

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1082.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1082


commit 9782afd7286c867dacf25ebb93740e89d139ff3a
Author: Kan Zhang 
Date:   2014-06-13T23:39:15Z

[SPARK-2141] Adding getPersistentRddIds and unpersistRDD to Java and Python 
API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL and ScalaTest in branch-...

2014-06-13 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1078#issuecomment-46069659
  
This looks good. I merged it.

Note that the only thing that is remotely scary is we are bumping the 
version of scalatest. However, that only affects development and doesn't change 
the binary build at all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-2137][SQL] Timestamp UDFs broken

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1081#issuecomment-46069415
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-2137][SQL] Timestamp UDFs broken

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1081#issuecomment-46069409
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2144] ExecutorsPage reports incorrect #...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1080#issuecomment-46069298
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15775/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2144] ExecutorsPage reports incorrect #...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1080#issuecomment-46069297
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-2137][SQL] Timestamp UDFs broken

2014-06-13 Thread yhuai
GitHub user yhuai opened a pull request:

https://github.com/apache/spark/pull/1081

[Spark-2137][SQL] Timestamp UDFs broken

https://issues.apache.org/jira/browse/SPARK-2137

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yhuai/spark SPARK-2137

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1081.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1081


commit 205f17ba3796134a9a48217e4fee1966cfa3eeb0
Author: Yin Huai 
Date:   2014-06-13T23:01:38Z

Make Hive UDF wrapper support Timestamp.

commit c04f91050b9f887f3a0f2ee2d218a08244541268
Author: Yin Huai 
Date:   2014-06-13T23:02:47Z

Merge remote-tracking branch 'upstream/master' into SPARK-2137




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL in branch-1.0 to match m...

2014-06-13 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1078#issuecomment-46068868
  
@pwendell and @rxin, can you please take a look at this?  This should fix 
compiling for branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2053][SQL] Add Catalyst expressions for...

2014-06-13 Thread concretevitamin
Github user concretevitamin commented on the pull request:

https://github.com/apache/spark/pull/1055#issuecomment-46068679
  
The latest build only contains some PySpark failures I think.


On Fri, Jun 13, 2014 at 3:43 PM, UCB AMPLab 
wrote:

> Build finished.
>
> —
> Reply to this email directly or view it on GitHub
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2053][SQL] Add Catalyst expressions for...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1055#issuecomment-46067879
  
Build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL in branch-1.0 to match m...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1078#issuecomment-46067880
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2053][SQL] Add Catalyst expressions for...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1055#issuecomment-46067881
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15772/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL in branch-1.0 to match m...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1078#issuecomment-46067882
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15774/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL in branch-1.0 to match m...

2014-06-13 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1078#issuecomment-46067459
  
There were some compilation errors.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Small correction in Streaming Programming Guid...

2014-06-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1079


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Small correction in Streaming Programming Guid...

2014-06-13 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1079#issuecomment-46067411
  
Thanks. Merging this in master & branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2144] ExecutorsPage reports incorrect #...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1080#issuecomment-46066321
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2144] ExecutorsPage reports incorrect #...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1080#issuecomment-46066299
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2144] ExecutorsPage reports incorrect #...

2014-06-13 Thread andrewor14
GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/1080

[SPARK-2144] ExecutorsPage reports incorrect # of RDD blocks

This is reproducible whenever we drop a block because of memory pressure.

When this happens, the StorageStatus should remove the block from its 
mapping of BlockId to BlockStatus. Instead, it simply replaces the BlockStatus 
with one that has a storage level of `StorageLevel.NONE`. This PR makes sure 
that we remove the block from the map if it is no longer cached.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark ui-blocks

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1080.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1080


commit fcf9f1a381b751cea49e69a266bbf22692be2af0
Author: Andrew Or 
Date:   2014-06-13T22:08:06Z

Remove BlockStatus if it is no longer cached

In StorageStatusListener, a block that is no longer cached has its
BlockStatus updated to reflect this in the StorageStatus blocks map.
What we should really do, however, is to remove this from the map.
Otherwise, the ExecutorsPage still thinks that this is block is cached.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL in branch-1.0 to match m...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1078#issuecomment-46063296
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL in branch-1.0 to match m...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1078#issuecomment-46063290
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/999#issuecomment-46061921
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15773/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/999#issuecomment-46061920
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/999#issuecomment-46061886
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/999#issuecomment-46061901
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2053][SQL] Add Catalyst expressions for...

2014-06-13 Thread concretevitamin
Github user concretevitamin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1055#discussion_r13772326
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
@@ -202,3 +201,78 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
 
   override def toString = s"if ($predicate) $trueValue else $falseValue"
 }
+
+// scalastyle:off
+/**
+ * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE 
e] END".
+ * Refer to this link for the corresponding semantics:
+ * 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-ConditionalFunctions
+ *
+ * The other form of case statements "CASE a WHEN b THEN c [WHEN d THEN 
e]* [ELSE f] END" gets
+ * translated to this form at parsing time.  Namely, such a statement gets 
translated to
+ * "CASE WHEN a=b THEN c [WHEN a=d THEN e]* [ELSE f] END".
+ *
+ * Note that `branches` are considered in consecutive pairs (cond, val), 
and the optional last
+ * element is the value for the default catch-all case (if provided). 
Hence, `branches` consists of
+ * at least two elements, and can have an odd or even length.
+ */
+// scalastyle:on
+case class CaseWhen(branches: Seq[Expression]) extends Expression {
+  type EvaluatedType = Any
+  def children = branches
+  def references = children.flatMap(_.references).toSet
+  def dataType = {
+if (!resolved) {
+  throw new UnresolvedException(this, "cannot resolve due to differing 
types in some branches")
+}
+branches(1).dataType
+  }
+
+  private[this] lazy val branchesArr = branches.toArray
--- End diff --

Done, thanks for taking a look!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2053][SQL] Add Catalyst expressions for...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1055#issuecomment-46061328
  
Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2053][SQL] Add Catalyst expressions for...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1055#issuecomment-46061313
  
 Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-13 Thread dorx
Github user dorx commented on the pull request:

https://github.com/apache/spark/pull/916#issuecomment-46059095
  
@colorant Thanks for taking a look at this! 

First of all let me just say that I ran Xiangrui's code but with 
".fill(1000)" (so 100x in RDD size), and it was still able to select a sample 
with exactly one data point in one pass. 

So there's a couple things in play here. The smallest resolution handled by 
a Double is 2^(-1074) ~ 5e-324, so before we run into RDDs of size ~10^323, we 
in theory won't run into have a sampling rate of 0. Then it comes down to 
whether the random number generator is truly random and isn't biased against 
very small numbers. The two experiments Xiangrui and I ran seem to suggest that 
the java.util.Random object is able to produce small enough random numbers. 
However, we should definitely further investigate the quality of the RNG used 
to gauge sampling behavior at even smaller sampling rates. 

One thing to note about this implementation is that at higher sampling 
rates, we are actually able to save memory by not caching as many samples as 
before in order to be able to guarantee the sample size in one try.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL in branch-1.0 to match m...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1078#issuecomment-46057767
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15771/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL in branch-1.0 to match m...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1078#issuecomment-46057766
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: added compatibility for python 2.6 for ssh_rea...

2014-06-13 Thread anantasty
Github user anantasty commented on the pull request:

https://github.com/apache/spark/pull/941#issuecomment-46057116
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL in branch-1.0 to match m...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1078#issuecomment-46054458
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL in branch-1.0 to match m...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1078#issuecomment-46054472
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Small correction in Streaming Programming Guid...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1079#issuecomment-46054456
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL in branch-1.0 to match m...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1078#issuecomment-46054322
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Update SparkSQL in branch-1.0 to match m...

2014-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1078#issuecomment-46054323
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15770/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2109] Setting SPARK_MEM for bin/pyspark...

2014-06-13 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1050#issuecomment-46054130
  
Hi @ScrapCodes. There are a couple of weird indentations, but other than 
that this looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2109] Setting SPARK_MEM for bin/pyspark...

2014-06-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1050#discussion_r13769376
  
--- Diff: bin/spark-class ---
@@ -38,8 +38,10 @@ if [ -z "$1" ]; then
 fi
 
 if [ -n "$SPARK_MEM" ]; then
-  echo "Warning: SPARK_MEM is deprecated, please use a more specific 
config option"
-  echo "(e.g., spark.executor.memory or SPARK_DRIVER_MEMORY)."
+cat <&2
--- End diff --

Indentation is weird here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Small correction in Streaming Programming Guid...

2014-06-13 Thread akkomar
GitHub user akkomar opened a pull request:

https://github.com/apache/spark/pull/1079

Small correction in Streaming Programming Guide doc

Corrected description of `repartition` function under 'Level of Parallelism 
in Data Receiving'.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akkomar/spark streaming-guide-doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1079.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1079


commit 32dfc62a76662e3e8ff97960367a0c321d49eb61
Author: akkomar 
Date:   2014-06-13T19:46:35Z

Corrected description of `repartition` function under 'Level of Parallelism 
in Data Receiving'.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2109] Setting SPARK_MEM for bin/pyspark...

2014-06-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1050#discussion_r13769347
  
--- Diff: bin/run-example ---
@@ -27,10 +27,12 @@ if [ -n "$1" ]; then
   EXAMPLE_CLASS="$1"
   shift
 else
-  echo "Usage: ./bin/run-example  [example-args]"
-  echo "  - set MASTER=XX to use a specific master"
-  echo "  - can use abbreviated example class name (e.g. SparkPi, 
mllib.LinearRegression)"
-  exit 1
+cat <&2
+ Usage: ./bin/run-example  [example-args]
--- End diff --

Indentation is weird here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2094][SQL] "Exactly once" semantics for...

2014-06-13 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1071#issuecomment-46053887
  
This is awesome!  Thanks :)

I merged this into master.  Merging into 1.0 failed, but I'll just make 
sure it gets included as part of: #1078 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2094][SQL] "Exactly once" semantics for...

2014-06-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1071


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2094][SQL] "Exactly once" semantics for...

2014-06-13 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1071#discussion_r13769120
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDDLike.scala 
---
@@ -48,7 +49,17 @@ private[sql] trait SchemaRDDLike {
*/
   @transient
   @DeveloperApi
-  lazy val queryExecution = sqlContext.executePlan(logicalPlan)
+  lazy val queryExecution = sqlContext.executePlan(baseLogicalPlan)
+
+  @transient protected[spark] val logicalPlan: LogicalPlan = 
baseLogicalPlan match {
+// For various commands (like DDL) and queries with side effects, we 
force query optimization to
+// happen right away to let these side effects take place eagerly.
+case _: Command | _: InsertIntoTable | _: InsertIntoCreatedTable | _: 
WriteToFile =>
+  queryExecution.toRdd
+  SparkLogicalPlan(queryExecution.executedPlan)
+case _ =>
+  baseLogicalPlan
+  }
--- End diff --

Good catch.  This is probably the problem I was seeing with double "UNCACHE 
TABLE".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1964][SQL] Add timestamp to HiveMetasto...

2014-06-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1061


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1964][SQL] Add timestamp to HiveMetasto...

2014-06-13 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1061#issuecomment-46053342
  
Merged into master and 1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   >