date:20150501

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-05-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5685


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-98322628
  
  [Test build #31654 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31654/consoleFull)
 for   PR 5685 at commit 
[`cd46230`](https://github.com/apache/spark/commit/cd4623006d0d30c1fcd66eb7c947eab4d201e43b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `   *   class SomethingNotSerializable `
  * `  logDebug(s" + cloning the object $obj of class $`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-98322643
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31654/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-98322642
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5859#issuecomment-98320137
  
  [Test build #31653 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31653/consoleFull)
 for   PR 5859 at commit 
[`f9aa9ce`](https://github.com/apache/spark/commit/f9aa9ce35b121f94c8801498266bf5d46d234b19).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SaslEncryption `
  * `  static class EncryptedMessage extends AbstractReferenceCounted 
implements FileRegion `
  * `class SaslRpcHandler extends RpcHandler `
  * `public class SaslServerBootstrap implements TransportServerBootstrap `
  * `public class SparkSaslClient implements SaslEncryptionBackend `
  * `public class SparkSaslServer implements SaslEncryptionBackend `
  * `public class ByteArrayWritableChannel implements WritableByteChannel `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5859#issuecomment-98320140
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31653/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5859#issuecomment-98320139
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5859


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5859#issuecomment-98320005
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31651/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5859#issuecomment-98320002
  
  [Test build #31651 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31651/consoleFull)
 for   PR 5859 at commit 
[`4b25056`](https://github.com/apache/spark/commit/4b25056dd12cc6b2bc8b0cf68d3573c592943053).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SaslEncryption `
  * `  static class EncryptedMessage extends AbstractReferenceCounted 
implements FileRegion `
  * `class SaslRpcHandler extends RpcHandler `
  * `public class SaslServerBootstrap implements TransportServerBootstrap `
  * `public class SparkSaslClient implements SaslEncryptionBackend `
  * `public class SparkSaslServer implements SaslEncryptionBackend `
  * `public class ByteArrayWritableChannel implements WritableByteChannel `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5859#issuecomment-98320004
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2015-05-01 Thread harishreedharan

Github user harishreedharan commented on the pull request:

https://github.com/apache/spark/pull/2994#issuecomment-98318497
  
I have not tested it. Go ahead and try it out and let me know.

On Friday, May 1, 2015, Pavan Sudheendra  wrote:

> @harishreedharan  Any chance I can
> use your version from the Spark Java API?
>
> â
> Reply to this email directly or view it on GitHub
> .
>


-- 

Thanks,
Hari



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7318] DStream incorrectly cleans RDD in...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5860#issuecomment-98317537
  
  [Test build #31655 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31655/consoleFull)
 for   PR 5860 at commit 
[`67eeff4`](https://github.com/apache/spark/commit/67eeff427380d4f68ba1a3b115d5cc8cf83afc67).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class ShuffleHandle(val shuffleId: Int) extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7318] DStream incorrectly cleans RDD in...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5860#issuecomment-98317565
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31655/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7318] DStream incorrectly cleans RDD in...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5860#issuecomment-98317562
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7149] [SQL] Fix system default alias pr...

2015-05-01 Thread haiyangsea

Github user haiyangsea commented on the pull request:

https://github.com/apache/spark/pull/5861#issuecomment-98316844
  
@rxin "having-condition" contains "-" operator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2015-05-01 Thread 91pavan

Github user 91pavan commented on the pull request:

https://github.com/apache/spark/pull/2994#issuecomment-98316465
  
@harishreedharan Any chance I can use your version from the Spark Java API?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7214] Reserve space for unrolling even ...

2015-05-01 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5784#issuecomment-98313148
  
cc @andrewor14 who worked a lot on this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7149] [SQL] Fix system default alias pr...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5861#issuecomment-98311958
  
  [Test build #31658 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31658/consoleFull)
 for   PR 5861 at commit 
[`620473e`](https://github.com/apache/spark/commit/620473eac8c2b519a67b183890cc05207fd81436).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7149] [SQL] Fix system default alias pr...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5861#issuecomment-98311408
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7149] [SQL] Fix system default alias pr...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5861#issuecomment-98311397
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7294][SQL] ADD BETWEEN

2015-05-01 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5839#issuecomment-98311425
  
Jenkins, ok to test.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7294][SQL] ADD BETWEEN

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5839#discussion_r29545478
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1289,6 +1289,17 @@ def cast(self, dataType):
 raise TypeError("unexpected type: %s" % type(dataType))
 return Column(jc)
 
+@ignore_unicode_prefix
+def between(self, lowerBound, upperBound):
+""" A boolean expression that is evaluated to true if the value of 
this
+expression is between the given columns.
+
+>>> df[df.col1.between(lowerBound, upperBound)].collect()
+[Row(col1=5, col2=6, col3=8)]
+"""
+jc = (self >= lowerBound) & (self <= upperBound)
+return Column(jc)
--- End diff --

actually I think you no longer need to wrap it in Column, since it is 
already a Python column.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7149] [SQL] Fix system default alias pr...

2015-05-01 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5861#issuecomment-98310852
  
Is having-condition invalid?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7149] [SQL] Fix system default alias pr...

2015-05-01 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5861#issuecomment-98310901
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7296][WebUI] Timeline view for Stage pa...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5843#issuecomment-98309223
  
  [Test build #31650 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31650/consoleFull)
 for   PR 5843 at commit 
[`2a9e376`](https://github.com/apache/spark/commit/2a9e37605fc47a61a77e7b0b46d0d0858f7f60ee).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7296][WebUI] Timeline view for Stage pa...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5843#issuecomment-98309228
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31650/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7296][WebUI] Timeline view for Stage pa...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5843#issuecomment-98309227
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545440
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
 ---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import java.io.File
+import java.net.URLClassLoader
+import java.util
+
+import scala.language.reflectiveCalls
+import scala.util.Try
+
+import org.apache.commons.io.{FileUtils, IOUtils}
+
+import org.apache.spark.Logging
+import org.apache.spark.deploy.SparkSubmitUtils
+
+import org.apache.spark.sql.catalyst.util.quietly
+
+/** Factory for `IsolatedClientLoader` with specific versions of hive. */
+object IsolatedClientLoader {
+  /**
+   * Creates isolated Hive client loaders by downloading the requested 
version from maven.
+   */
+  def forVersion(
+  version: Int,
+  config: Map[String, String] = Map.empty): IsolatedClientLoader = 
synchronized {
+val files = resolvedVersions.getOrElseUpdate(version, 
downloadVersion(version))
+new IsolatedClientLoader(hiveVersion(version), files, config)
+  }
+
+  def hiveVersion(version: Int): HiveVersion = version match {
+case 12 => hive.v12
+case 13 => hive.v13
--- End diff --

I think it would be easier if we just do the full string, rather than an 
int.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6368][SQL][Follow-up] Use Serializer2 i...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5849#issuecomment-98308816
  
  [Test build #31657 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31657/consoleFull)
 for   PR 5849 at commit 
[`8627238`](https://github.com/apache/spark/commit/86272380a9b233a78a3a956b406d1054fa2ee21b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545433
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala ---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+/** Support for interacting with different versions of the 
HiveMetastoreClient */
+package object client {
+  private[client] abstract class HiveVersion(val fullVersion: String)
+
+  // scalastyle:off
+  private[client] object hive {
+case object v12 extends HiveVersion("0.12.0")
+case object v13 extends HiveVersion("0.13.1")
+  }
+  // scalastyle:on
+}
--- End diff --

new line here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6368][SQL][Follow-up] Use Serializer2 i...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5849#issuecomment-98308335
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545432
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ReflectionMagic.scala 
---
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import scala.reflect._
+
+/**
+ * Provides implicit functions on any object for calling methods 
reflectively.
+ */
+protected trait ReflectionMagic {
+/** code for InstanceMagic
+println(
+(1 to 22).map { n =>
--- End diff --

10 is probably enough :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6368][SQL][Follow-up] Use Serializer2 i...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5849#issuecomment-98308283
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545422
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
 ---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import java.io.File
+import java.net.URLClassLoader
+import java.util
+
+import scala.language.reflectiveCalls
+import scala.util.Try
+
+import org.apache.commons.io.{FileUtils, IOUtils}
+
+import org.apache.spark.Logging
+import org.apache.spark.deploy.SparkSubmitUtils
+
+import org.apache.spark.sql.catalyst.util.quietly
+
+/** Factory for `IsolatedClientLoader` with specific versions of hive. */
+object IsolatedClientLoader {
+  /**
+   * Creates isolated Hive client loaders by downloading the requested 
version from maven.
+   */
+  def forVersion(
+  version: Int,
+  config: Map[String, String] = Map.empty): IsolatedClientLoader = 
synchronized {
+val files = resolvedVersions.getOrElseUpdate(version, 
downloadVersion(version))
+new IsolatedClientLoader(hiveVersion(version), files, config)
+  }
+
+  def hiveVersion(version: Int): HiveVersion = version match {
+case 12 => hive.v12
+case 13 => hive.v13
+  }
+
+  private def downloadVersion(version: Int): Seq[File] = {
+val v = hiveVersion(version).fullVersion
+val hiveArtifacts =
+  (Seq("hive-metastore", "hive-exec", "hive-common", "hive-serde") ++
+(if (version <= 10) "hive-builtins" :: Nil else Nil))
+.map(a => s"org.apache.hive:$a:$v") :+
+"com.google.guava:guava:14.0.1" :+
+"org.apache.hadoop:hadoop-client:2.4.0" :+
+"mysql:mysql-connector-java:5.1.12"
+
+val classpath = quietly {
+  SparkSubmitUtils.resolveMavenCoordinates(
+hiveArtifacts.mkString(","),
+Some("http://www.datanucleus.org/downloads/maven2";),
+None)
+}
+val allFiles = classpath.split(",").map(new File(_)).toSet
+
+// TODO: Remove copy logic.
+val tempDir = File.createTempFile("hive", "v" + version.toString)
+tempDir.delete()
+tempDir.mkdir()
+
+allFiles.foreach(f => FileUtils.copyFileToDirectory(f, tempDir))
+tempDir.listFiles()
+  }
+
+  private def resolvedVersions = new scala.collection.mutable.HashMap[Int, 
Seq[File]]
+}
+
+/**
+ * Creates a Hive `ClientInterface` using a classloader that works 
according to the following rules:
+ *  - Shared classes: Java, Scala, logging, and Spark classes are 
delegated to `baseClassLoader`
+ *allowing the results of calls to the `ClientInterface` to be visible 
externally.
+ *  - Hive classes: new instances are loaded from `execJars`.  These 
classes are not
+ *accessible externally due to their custom loading.
+ *  - ClientWrapper: a new copy is created for each instance of 
`IsolatedClassLoader`.
+ *This new instance is able to see a specific version of hive without 
using reflection. Since
+ *this is a unique instance, it is not visible externally other than 
as a generic
+ *`ClientInterface`, unless `isolationOn` is set to `false`.
+ *
+ * @param version The version of hive on the classpath.  used to pick 
specific function signatures
+ *that are not compatibile accross versions.
+ * @param execJars A collection of jar files that must include hive and 
hadoop.
+ * @param config   A set of options that will be added to the HiveConf of 
the constructed client.
+ * @param isolationOn When true, custom versions of barrier classes will 
be constructed.  Must be
+ *true unless loading the version of hive that is on 
Sparks classloader.
+ * @param rootClassLoader The system root classloader.  Must not know 
about hive classes.
+ * @param baseClassLoader The spark classl

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545405
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
 ---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import java.io.File
+import java.net.URLClassLoader
+import java.util
+
+import scala.language.reflectiveCalls
+import scala.util.Try
+
+import org.apache.commons.io.{FileUtils, IOUtils}
+
+import org.apache.spark.Logging
+import org.apache.spark.deploy.SparkSubmitUtils
+
+import org.apache.spark.sql.catalyst.util.quietly
+
+/** Factory for `IsolatedClientLoader` with specific versions of hive. */
+object IsolatedClientLoader {
+  /**
+   * Creates isolated Hive client loaders by downloading the requested 
version from maven.
+   */
+  def forVersion(
+  version: Int,
+  config: Map[String, String] = Map.empty): IsolatedClientLoader = 
synchronized {
+val files = resolvedVersions.getOrElseUpdate(version, 
downloadVersion(version))
+new IsolatedClientLoader(hiveVersion(version), files, config)
+  }
+
+  def hiveVersion(version: Int): HiveVersion = version match {
+case 12 => hive.v12
+case 13 => hive.v13
--- End diff --

how are we going to denote 1.0 and 1.1?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545397
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -0,0 +1,392 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import java.io.{BufferedReader, InputStreamReader, File, PrintStream}
+import java.net.URI
+import java.util.{ArrayList => JArrayList}
+
+import scala.collection.JavaConversions._
+import scala.language.reflectiveCalls
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.metastore.api.Database
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.metastore.api.FieldSchema
+import org.apache.hadoop.hive.ql.metadata._
+import org.apache.hadoop.hive.ql.session.SessionState
+import org.apache.hadoop.hive.ql.processors._
+import org.apache.hadoop.hive.ql.Driver
+
+import org.apache.spark.Logging
+import org.apache.spark.sql.execution.QueryExecutionException
+
+
+/**
+ * A class that wraps the HiveClient and converts its responses to 
externally visible classes.
+ * Note that this class is typically loaded with an internal classloader 
for each instantiation,
+ * allowing it to interact directly with a specific isolated version of 
Hive.  Loading this class
+ * with the isolated classloader however will result in it only being 
visible as a ClientInterface,
+ * not a ClientWrapper.
+ *
+ * This class needs to interact with multiple versions of Hive, but will 
always be compiled with
+ * the 'native', execution version of Hive.  Therefore, any places where 
hive breaks compatibility
+ * must use reflection after matching on `version`.
+ *
+ * @param version the version of hive used when pick function calls that 
are not compatible.
+ * @param config  a collection of configuration options that will be added 
to the hive conf before
+ *opening the hive client.
+ */
+class ClientWrapper(
+version: HiveVersion,
+config: Map[String, String])
+  extends ClientInterface
+  with Logging
+  with ReflectionMagic {
+
+  private val conf = new HiveConf(classOf[SessionState])
+  config.foreach { case (k, v) =>
+logDebug(s"Hive Config: $k=$v")
+conf.set(k, v)
+  }
+
+  private def properties = Seq(
+"javax.jdo.option.ConnectionURL",
+"javax.jdo.option.ConnectionDriverName",
+"javax.jdo.option.ConnectionUserName")
+
+  properties.foreach(p => logInfo(s"Hive Configuration: $p = 
${conf.get(p)}"))
+
+  // Circular buffer to hold what hive prints to STDOUT and ERR.  Only 
printed when failures occur.
+  private val outputBuffer = new java.io.OutputStream {
+var pos: Int = 0
+var buffer = new Array[Int](10240)
+def write(i: Int): Unit = {
+  buffer(pos) = i
+  pos = (pos + 1) % buffer.size
+}
+
+override def toString: String = {
+  val (end, start) = buffer.splitAt(pos)
+  val input = new java.io.InputStream {
+val iterator = (start ++ end).iterator
+
+def read(): Int = if (iterator.hasNext) iterator.next() else -1
+  }
+  val reader = new BufferedReader(new InputStreamReader(input))
+  val stringBuilder = new StringBuilder
+  var line = reader.readLine()
+  while(line != null) {
+stringBuilder.append(line)
+stringBuilder.append("\n")
+line = reader.readLine()
+  }
+  stringBuilder.toString()
+}
+  }
+
+  val state = {
+val original = Thread.currentThread().getContextClassLoader
+Thread.currentThread().setContextClassLoader(getClass.getClassLoader)
+val ret = try {
+  val newState = new SessionState(conf)
+  SessionState.start(newState)
+  newState.out = new PrintStream(outputBuffer, true, "UTF-8")
+

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545395
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -0,0 +1,392 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import java.io.{BufferedReader, InputStreamReader, File, PrintStream}
+import java.net.URI
+import java.util.{ArrayList => JArrayList}
+
+import scala.collection.JavaConversions._
+import scala.language.reflectiveCalls
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.metastore.api.Database
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.metastore.api.FieldSchema
+import org.apache.hadoop.hive.ql.metadata._
+import org.apache.hadoop.hive.ql.session.SessionState
+import org.apache.hadoop.hive.ql.processors._
+import org.apache.hadoop.hive.ql.Driver
+
+import org.apache.spark.Logging
+import org.apache.spark.sql.execution.QueryExecutionException
+
+
+/**
+ * A class that wraps the HiveClient and converts its responses to 
externally visible classes.
+ * Note that this class is typically loaded with an internal classloader 
for each instantiation,
+ * allowing it to interact directly with a specific isolated version of 
Hive.  Loading this class
+ * with the isolated classloader however will result in it only being 
visible as a ClientInterface,
+ * not a ClientWrapper.
+ *
+ * This class needs to interact with multiple versions of Hive, but will 
always be compiled with
+ * the 'native', execution version of Hive.  Therefore, any places where 
hive breaks compatibility
+ * must use reflection after matching on `version`.
+ *
+ * @param version the version of hive used when pick function calls that 
are not compatible.
+ * @param config  a collection of configuration options that will be added 
to the hive conf before
+ *opening the hive client.
+ */
+class ClientWrapper(
+version: HiveVersion,
+config: Map[String, String])
+  extends ClientInterface
+  with Logging
+  with ReflectionMagic {
+
+  private val conf = new HiveConf(classOf[SessionState])
+  config.foreach { case (k, v) =>
+logDebug(s"Hive Config: $k=$v")
+conf.set(k, v)
+  }
+
+  private def properties = Seq(
+"javax.jdo.option.ConnectionURL",
+"javax.jdo.option.ConnectionDriverName",
+"javax.jdo.option.ConnectionUserName")
+
+  properties.foreach(p => logInfo(s"Hive Configuration: $p = 
${conf.get(p)}"))
+
+  // Circular buffer to hold what hive prints to STDOUT and ERR.  Only 
printed when failures occur.
+  private val outputBuffer = new java.io.OutputStream {
+var pos: Int = 0
+var buffer = new Array[Int](10240)
+def write(i: Int): Unit = {
+  buffer(pos) = i
+  pos = (pos + 1) % buffer.size
+}
+
+override def toString: String = {
+  val (end, start) = buffer.splitAt(pos)
+  val input = new java.io.InputStream {
+val iterator = (start ++ end).iterator
+
+def read(): Int = if (iterator.hasNext) iterator.next() else -1
+  }
+  val reader = new BufferedReader(new InputStreamReader(input))
+  val stringBuilder = new StringBuilder
+  var line = reader.readLine()
+  while(line != null) {
+stringBuilder.append(line)
+stringBuilder.append("\n")
+line = reader.readLine()
+  }
+  stringBuilder.toString()
+}
+  }
+
+  val state = {
+val original = Thread.currentThread().getContextClassLoader
+Thread.currentThread().setContextClassLoader(getClass.getClassLoader)
+val ret = try {
+  val newState = new SessionState(conf)
+  SessionState.start(newState)
+  newState.out = new PrintStream(outputBuffer, true, "UTF-8")
+

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545380
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -0,0 +1,392 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import java.io.{BufferedReader, InputStreamReader, File, PrintStream}
+import java.net.URI
+import java.util.{ArrayList => JArrayList}
+
+import scala.collection.JavaConversions._
+import scala.language.reflectiveCalls
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.metastore.api.Database
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.metastore.api.FieldSchema
+import org.apache.hadoop.hive.ql.metadata._
+import org.apache.hadoop.hive.ql.session.SessionState
+import org.apache.hadoop.hive.ql.processors._
+import org.apache.hadoop.hive.ql.Driver
+
+import org.apache.spark.Logging
+import org.apache.spark.sql.execution.QueryExecutionException
+
+
+/**
+ * A class that wraps the HiveClient and converts its responses to 
externally visible classes.
+ * Note that this class is typically loaded with an internal classloader 
for each instantiation,
+ * allowing it to interact directly with a specific isolated version of 
Hive.  Loading this class
+ * with the isolated classloader however will result in it only being 
visible as a ClientInterface,
+ * not a ClientWrapper.
+ *
+ * This class needs to interact with multiple versions of Hive, but will 
always be compiled with
+ * the 'native', execution version of Hive.  Therefore, any places where 
hive breaks compatibility
+ * must use reflection after matching on `version`.
+ *
+ * @param version the version of hive used when pick function calls that 
are not compatible.
+ * @param config  a collection of configuration options that will be added 
to the hive conf before
+ *opening the hive client.
+ */
+class ClientWrapper(
+version: HiveVersion,
+config: Map[String, String])
+  extends ClientInterface
+  with Logging
+  with ReflectionMagic {
+
+  private val conf = new HiveConf(classOf[SessionState])
+  config.foreach { case (k, v) =>
+logDebug(s"Hive Config: $k=$v")
+conf.set(k, v)
+  }
+
+  private def properties = Seq(
+"javax.jdo.option.ConnectionURL",
+"javax.jdo.option.ConnectionDriverName",
+"javax.jdo.option.ConnectionUserName")
+
+  properties.foreach(p => logInfo(s"Hive Configuration: $p = 
${conf.get(p)}"))
+
+  // Circular buffer to hold what hive prints to STDOUT and ERR.  Only 
printed when failures occur.
+  private val outputBuffer = new java.io.OutputStream {
+var pos: Int = 0
+var buffer = new Array[Int](10240)
+def write(i: Int): Unit = {
+  buffer(pos) = i
+  pos = (pos + 1) % buffer.size
+}
+
+override def toString: String = {
+  val (end, start) = buffer.splitAt(pos)
+  val input = new java.io.InputStream {
+val iterator = (start ++ end).iterator
+
+def read(): Int = if (iterator.hasNext) iterator.next() else -1
+  }
+  val reader = new BufferedReader(new InputStreamReader(input))
+  val stringBuilder = new StringBuilder
+  var line = reader.readLine()
+  while(line != null) {
+stringBuilder.append(line)
+stringBuilder.append("\n")
+line = reader.readLine()
+  }
+  stringBuilder.toString()
+}
+  }
+
+  val state = {
+val original = Thread.currentThread().getContextClassLoader
+Thread.currentThread().setContextClassLoader(getClass.getClassLoader)
+val ret = try {
+  val newState = new SessionState(conf)
+  SessionState.start(newState)
+  newState.out = new PrintStream(outputBuffer, true, "UTF-8")
+

[GitHub] spark pull request: [SPARK-7149] [SQL] Fix system default alias pr...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5861#issuecomment-98308017
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545366
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -0,0 +1,392 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import java.io.{BufferedReader, InputStreamReader, File, PrintStream}
+import java.net.URI
+import java.util.{ArrayList => JArrayList}
+
+import scala.collection.JavaConversions._
+import scala.language.reflectiveCalls
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.metastore.api.Database
--- End diff --

given the class names are very likely to collide, I think we should 
consider just importing o.s.h.hive, and then prefix everything with hive. ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7149] [SQL] Fix system default alias pr...

2015-05-01 Thread haiyangsea

GitHub user haiyangsea opened a pull request:

https://github.com/apache/spark/pull/5861

[SPARK-7149] [SQL] Fix system default alias problem

executing the following sql statement will throw exception:

```sql
select key as havingCondition from testData group by key having key > 
count(*)
```

org.apache.spark.sql.AnalysisException: Reference 'havingCondition' is 
ambiguous, could be: havingCondition#42, havingCondition#41.

```sql
select substr(value, 0, 2), key as c0 from testData order by c0 desc limit 2
```

org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
be: c0#42, c0#41.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/haiyangsea/spark alias

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5861.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5861


commit 620473eac8c2b519a67b183890cc05207fd81436
Author: haiyang 
Date:   2015-04-26T07:09:45Z

fix system default alias problem




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545347
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -0,0 +1,392 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import java.io.{BufferedReader, InputStreamReader, File, PrintStream}
+import java.net.URI
+import java.util.{ArrayList => JArrayList}
+
+import scala.collection.JavaConversions._
+import scala.language.reflectiveCalls
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.metastore.api.Database
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.metastore.api.FieldSchema
+import org.apache.hadoop.hive.ql.metadata._
+import org.apache.hadoop.hive.ql.session.SessionState
+import org.apache.hadoop.hive.ql.processors._
+import org.apache.hadoop.hive.ql.Driver
+
+import org.apache.spark.Logging
+import org.apache.spark.sql.execution.QueryExecutionException
+
+
+/**
+ * A class that wraps the HiveClient and converts its responses to 
externally visible classes.
+ * Note that this class is typically loaded with an internal classloader 
for each instantiation,
+ * allowing it to interact directly with a specific isolated version of 
Hive.  Loading this class
+ * with the isolated classloader however will result in it only being 
visible as a ClientInterface,
+ * not a ClientWrapper.
+ *
+ * This class needs to interact with multiple versions of Hive, but will 
always be compiled with
+ * the 'native', execution version of Hive.  Therefore, any places where 
hive breaks compatibility
+ * must use reflection after matching on `version`.
+ *
+ * @param version the version of hive used when pick function calls that 
are not compatible.
+ * @param config  a collection of configuration options that will be added 
to the hive conf before
+ *opening the hive client.
+ */
+class ClientWrapper(
+version: HiveVersion,
+config: Map[String, String])
+  extends ClientInterface
--- End diff --

would be great to add implementations of all the ClientInterface with 
"override"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545342
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -0,0 +1,392 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import java.io.{BufferedReader, InputStreamReader, File, PrintStream}
+import java.net.URI
+import java.util.{ArrayList => JArrayList}
+
+import scala.collection.JavaConversions._
+import scala.language.reflectiveCalls
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.metastore.api.Database
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.metastore.api.FieldSchema
+import org.apache.hadoop.hive.ql.metadata._
+import org.apache.hadoop.hive.ql.session.SessionState
+import org.apache.hadoop.hive.ql.processors._
+import org.apache.hadoop.hive.ql.Driver
+
+import org.apache.spark.Logging
+import org.apache.spark.sql.execution.QueryExecutionException
+
+
+/**
+ * A class that wraps the HiveClient and converts its responses to 
externally visible classes.
+ * Note that this class is typically loaded with an internal classloader 
for each instantiation,
+ * allowing it to interact directly with a specific isolated version of 
Hive.  Loading this class
+ * with the isolated classloader however will result in it only being 
visible as a ClientInterface,
+ * not a ClientWrapper.
+ *
+ * This class needs to interact with multiple versions of Hive, but will 
always be compiled with
+ * the 'native', execution version of Hive.  Therefore, any places where 
hive breaks compatibility
+ * must use reflection after matching on `version`.
+ *
+ * @param version the version of hive used when pick function calls that 
are not compatible.
+ * @param config  a collection of configuration options that will be added 
to the hive conf before
+ *opening the hive client.
+ */
+class ClientWrapper(
+version: HiveVersion,
+config: Map[String, String])
+  extends ClientInterface
+  with Logging
+  with ReflectionMagic {
+
+  private val conf = new HiveConf(classOf[SessionState])
+  config.foreach { case (k, v) =>
+logDebug(s"Hive Config: $k=$v")
+conf.set(k, v)
+  }
+
+  private def properties = Seq(
+"javax.jdo.option.ConnectionURL",
+"javax.jdo.option.ConnectionDriverName",
+"javax.jdo.option.ConnectionUserName")
+
+  properties.foreach(p => logInfo(s"Hive Configuration: $p = 
${conf.get(p)}"))
+
+  // Circular buffer to hold what hive prints to STDOUT and ERR.  Only 
printed when failures occur.
+  private val outputBuffer = new java.io.OutputStream {
+var pos: Int = 0
+var buffer = new Array[Int](10240)
+def write(i: Int): Unit = {
+  buffer(pos) = i
+  pos = (pos + 1) % buffer.size
+}
+
+override def toString: String = {
+  val (end, start) = buffer.splitAt(pos)
+  val input = new java.io.InputStream {
+val iterator = (start ++ end).iterator
+
+def read(): Int = if (iterator.hasNext) iterator.next() else -1
+  }
+  val reader = new BufferedReader(new InputStreamReader(input))
+  val stringBuilder = new StringBuilder
+  var line = reader.readLine()
+  while(line != null) {
+stringBuilder.append(line)
+stringBuilder.append("\n")
+line = reader.readLine()
+  }
+  stringBuilder.toString()
+}
+  }
+
+  val state = {
+val original = Thread.currentThread().getContextClassLoader
+Thread.currentThread().setContextClassLoader(getClass.getClassLoader)
+val ret = try {
+  val newState = new SessionState(conf)
+  SessionState.start(newState)
+  newState.out = new PrintStream(outputBuffer, true, "UTF-8")
+

[GitHub] spark pull request: [SPARK-6954] [YARN] ExecutorAllocationManager ...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5856#issuecomment-98307838
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31648/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545336
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -0,0 +1,392 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import java.io.{BufferedReader, InputStreamReader, File, PrintStream}
+import java.net.URI
+import java.util.{ArrayList => JArrayList}
+
+import scala.collection.JavaConversions._
+import scala.language.reflectiveCalls
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.metastore.api.Database
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.metastore.api.FieldSchema
+import org.apache.hadoop.hive.ql.metadata._
+import org.apache.hadoop.hive.ql.session.SessionState
+import org.apache.hadoop.hive.ql.processors._
+import org.apache.hadoop.hive.ql.Driver
+
+import org.apache.spark.Logging
+import org.apache.spark.sql.execution.QueryExecutionException
+
+
+/**
+ * A class that wraps the HiveClient and converts its responses to 
externally visible classes.
+ * Note that this class is typically loaded with an internal classloader 
for each instantiation,
+ * allowing it to interact directly with a specific isolated version of 
Hive.  Loading this class
+ * with the isolated classloader however will result in it only being 
visible as a ClientInterface,
+ * not a ClientWrapper.
+ *
+ * This class needs to interact with multiple versions of Hive, but will 
always be compiled with
+ * the 'native', execution version of Hive.  Therefore, any places where 
hive breaks compatibility
+ * must use reflection after matching on `version`.
+ *
+ * @param version the version of hive used when pick function calls that 
are not compatible.
+ * @param config  a collection of configuration options that will be added 
to the hive conf before
+ *opening the hive client.
+ */
+class ClientWrapper(
+version: HiveVersion,
+config: Map[String, String])
+  extends ClientInterface
+  with Logging
+  with ReflectionMagic {
+
+  private val conf = new HiveConf(classOf[SessionState])
+  config.foreach { case (k, v) =>
+logDebug(s"Hive Config: $k=$v")
+conf.set(k, v)
+  }
+
+  private def properties = Seq(
+"javax.jdo.option.ConnectionURL",
+"javax.jdo.option.ConnectionDriverName",
+"javax.jdo.option.ConnectionUserName")
+
+  properties.foreach(p => logInfo(s"Hive Configuration: $p = 
${conf.get(p)}"))
+
+  // Circular buffer to hold what hive prints to STDOUT and ERR.  Only 
printed when failures occur.
+  private val outputBuffer = new java.io.OutputStream {
+var pos: Int = 0
+var buffer = new Array[Int](10240)
+def write(i: Int): Unit = {
+  buffer(pos) = i
+  pos = (pos + 1) % buffer.size
+}
+
+override def toString: String = {
+  val (end, start) = buffer.splitAt(pos)
+  val input = new java.io.InputStream {
+val iterator = (start ++ end).iterator
+
+def read(): Int = if (iterator.hasNext) iterator.next() else -1
+  }
+  val reader = new BufferedReader(new InputStreamReader(input))
+  val stringBuilder = new StringBuilder
+  var line = reader.readLine()
+  while(line != null) {
+stringBuilder.append(line)
+stringBuilder.append("\n")
+line = reader.readLine()
+  }
+  stringBuilder.toString()
+}
+  }
+
+  val state = {
+val original = Thread.currentThread().getContextClassLoader
+Thread.currentThread().setContextClassLoader(getClass.getClassLoader)
+val ret = try {
+  val newState = new SessionState(conf)
+  SessionState.start(newState)
+  newState.out = new PrintStream(outputBuffer, true, "UTF-8")
+

[GitHub] spark pull request: [SPARK-6954] [YARN] ExecutorAllocationManager ...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5856#issuecomment-98307837
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6954] [YARN] ExecutorAllocationManager ...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5856#issuecomment-98307833
  
  [Test build #31648 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31648/consoleFull)
 for   PR 5856 at commit 
[`1cb517a`](https://github.com/apache/spark/commit/1cb517a8a17131219925966571dbf1d210b756b7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545328
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -0,0 +1,392 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import java.io.{BufferedReader, InputStreamReader, File, PrintStream}
+import java.net.URI
+import java.util.{ArrayList => JArrayList}
+
+import scala.collection.JavaConversions._
+import scala.language.reflectiveCalls
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.metastore.api.Database
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.metastore.api.FieldSchema
+import org.apache.hadoop.hive.ql.metadata._
+import org.apache.hadoop.hive.ql.session.SessionState
+import org.apache.hadoop.hive.ql.processors._
+import org.apache.hadoop.hive.ql.Driver
+
+import org.apache.spark.Logging
+import org.apache.spark.sql.execution.QueryExecutionException
+
+
+/**
+ * A class that wraps the HiveClient and converts its responses to 
externally visible classes.
+ * Note that this class is typically loaded with an internal classloader 
for each instantiation,
+ * allowing it to interact directly with a specific isolated version of 
Hive.  Loading this class
+ * with the isolated classloader however will result in it only being 
visible as a ClientInterface,
+ * not a ClientWrapper.
+ *
+ * This class needs to interact with multiple versions of Hive, but will 
always be compiled with
+ * the 'native', execution version of Hive.  Therefore, any places where 
hive breaks compatibility
+ * must use reflection after matching on `version`.
+ *
+ * @param version the version of hive used when pick function calls that 
are not compatible.
+ * @param config  a collection of configuration options that will be added 
to the hive conf before
+ *opening the hive client.
+ */
+class ClientWrapper(
+version: HiveVersion,
+config: Map[String, String])
+  extends ClientInterface
+  with Logging
+  with ReflectionMagic {
+
+  private val conf = new HiveConf(classOf[SessionState])
+  config.foreach { case (k, v) =>
+logDebug(s"Hive Config: $k=$v")
+conf.set(k, v)
+  }
+
+  private def properties = Seq(
+"javax.jdo.option.ConnectionURL",
+"javax.jdo.option.ConnectionDriverName",
+"javax.jdo.option.ConnectionUserName")
+
+  properties.foreach(p => logInfo(s"Hive Configuration: $p = 
${conf.get(p)}"))
+
+  // Circular buffer to hold what hive prints to STDOUT and ERR.  Only 
printed when failures occur.
+  private val outputBuffer = new java.io.OutputStream {
+var pos: Int = 0
+var buffer = new Array[Int](10240)
+def write(i: Int): Unit = {
+  buffer(pos) = i
+  pos = (pos + 1) % buffer.size
+}
+
+override def toString: String = {
+  val (end, start) = buffer.splitAt(pos)
+  val input = new java.io.InputStream {
+val iterator = (start ++ end).iterator
+
+def read(): Int = if (iterator.hasNext) iterator.next() else -1
+  }
+  val reader = new BufferedReader(new InputStreamReader(input))
+  val stringBuilder = new StringBuilder
+  var line = reader.readLine()
+  while(line != null) {
+stringBuilder.append(line)
+stringBuilder.append("\n")
+line = reader.readLine()
+  }
+  stringBuilder.toString()
+}
+  }
+
+  val state = {
+val original = Thread.currentThread().getContextClassLoader
+Thread.currentThread().setContextClassLoader(getClass.getClassLoader)
+val ret = try {
+  val newState = new SessionState(conf)
+  SessionState.start(newState)
+  newState.out = new PrintStream(outputBuffer, true, "UTF-8")
+

[GitHub] spark pull request: [SPARK-7318] DStream incorrectly cleans RDD in...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5860#issuecomment-98307765
  
  [Test build #31655 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31655/consoleFull)
 for   PR 5860 at commit 
[`67eeff4`](https://github.com/apache/spark/commit/67eeff427380d4f68ba1a3b115d5cc8cf83afc67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7007][core] Add a metric source for Exe...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5589#issuecomment-98307673
  
  [Test build #31656 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31656/consoleFull)
 for   PR 5589 at commit 
[`104d155`](https://github.com/apache/spark/commit/104d155200cd759be2619e1a1e46cf43c76cfe45).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6939][Streaming][WebUI] Add timeline an...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5533#issuecomment-98307682
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31649/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6939][Streaming][WebUI] Add timeline an...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5533#issuecomment-98307676
  
  [Test build #31649 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31649/consoleFull)
 for   PR 5533 at commit 
[`3be4b7a`](https://github.com/apache/spark/commit/3be4b7ab4d7fffc0f3917706d0fd7f4a61f19194).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SaslEncryption `
  * `  static class EncryptedMessage extends AbstractReferenceCounted 
implements FileRegion `
  * `class SaslRpcHandler extends RpcHandler `
  * `public class SaslServerBootstrap implements TransportServerBootstrap `
  * `public class SparkSaslClient implements SaslEncryptionBackend `
  * `public class SparkSaslServer implements SaslEncryptionBackend `
  * `public class ByteArrayWritableChannel implements WritableByteChannel `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7007][core] Add a metric source for Exe...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5589#issuecomment-98307647
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6939][Streaming][WebUI] Add timeline an...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5533#issuecomment-98307681
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7318] DStream incorrectly cleans RDD in...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5860#issuecomment-98307646
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7318] DStream incorrectly cleans RDD in...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5860#issuecomment-98307639
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7007][core] Add a metric source for Exe...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5589#issuecomment-98307640
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7243][SQL] Contingency Tables for DataF...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5842#discussion_r29545319
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala 
---
@@ -77,4 +78,42 @@ private[sql] object StatFunctions {
   })
 counts.cov
   }
+
+  /** Generate a table of frequencies for the elements of two columns. */
+  private[sql] def crossTabulate(df: DataFrame, col1: String, col2: 
String): DataFrame = {
+val tableName = s"${col1}_$col2"
+val distinctCol2 = 
df.select(col2).distinct.collect().sortBy(_.get(0).toString)
--- End diff --

mhmm I'm not sure if I agree. Doing it this way requires 2 pass, and also 
does not rely on the underlying execution engine. The physical execution will 
get faster over time, and we definitely want to take advantage of that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7318] DStream incorrectly cleans RDD in...

2015-05-01 Thread andrewor14

GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/5860

[SPARK-7318] DStream incorrectly cleans RDD instead of closures

I added a check in `ClosureCleaner#clean` to fail fast if this is detected 
in the future.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark streaming-closure-cleaner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5860.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5860


commit a4fa7686f6bb81c43918cdca5511507aa75b8d39
Author: Andrew Or 
Date:   2015-05-02T05:17:38Z

Clean the closure, not the RDD

commit 67eeff427380d4f68ba1a3b115d5cc8cf83afc67
Author: Andrew Or 
Date:   2015-05-02T05:25:28Z

Add tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7007][core] Add a metric source for Exe...

2015-05-01 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/5589#issuecomment-98307617
  
@andrewor14 @sryza , thanks a lot for your comments, I just rebase the 
codes, please review it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545301
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -0,0 +1,392 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import java.io.{BufferedReader, InputStreamReader, File, PrintStream}
+import java.net.URI
+import java.util.{ArrayList => JArrayList}
+
+import scala.collection.JavaConversions._
+import scala.language.reflectiveCalls
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.metastore.api.Database
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.metastore.api.FieldSchema
+import org.apache.hadoop.hive.ql.metadata._
+import org.apache.hadoop.hive.ql.session.SessionState
+import org.apache.hadoop.hive.ql.processors._
+import org.apache.hadoop.hive.ql.Driver
+
+import org.apache.spark.Logging
+import org.apache.spark.sql.execution.QueryExecutionException
+
+
+/**
+ * A class that wraps the HiveClient and converts its responses to 
externally visible classes.
+ * Note that this class is typically loaded with an internal classloader 
for each instantiation,
+ * allowing it to interact directly with a specific isolated version of 
Hive.  Loading this class
+ * with the isolated classloader however will result in it only being 
visible as a ClientInterface,
+ * not a ClientWrapper.
+ *
+ * This class needs to interact with multiple versions of Hive, but will 
always be compiled with
+ * the 'native', execution version of Hive.  Therefore, any places where 
hive breaks compatibility
+ * must use reflection after matching on `version`.
+ *
+ * @param version the version of hive used when pick function calls that 
are not compatible.
+ * @param config  a collection of configuration options that will be added 
to the hive conf before
+ *opening the hive client.
+ */
+class ClientWrapper(
+version: HiveVersion,
+config: Map[String, String])
+  extends ClientInterface
+  with Logging
+  with ReflectionMagic {
+
+  private val conf = new HiveConf(classOf[SessionState])
+  config.foreach { case (k, v) =>
+logDebug(s"Hive Config: $k=$v")
+conf.set(k, v)
+  }
+
+  private def properties = Seq(
+"javax.jdo.option.ConnectionURL",
+"javax.jdo.option.ConnectionDriverName",
+"javax.jdo.option.ConnectionUserName")
+
+  properties.foreach(p => logInfo(s"Hive Configuration: $p = 
${conf.get(p)}"))
--- End diff --

i think u'd want to put all the conf in a single logInfo. otherwise it 
might spread out across multiple logging lines or interweave with other ones


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545297
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -0,0 +1,392 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import java.io.{BufferedReader, InputStreamReader, File, PrintStream}
+import java.net.URI
+import java.util.{ArrayList => JArrayList}
+
+import scala.collection.JavaConversions._
+import scala.language.reflectiveCalls
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.metastore.api.Database
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.metastore.api.FieldSchema
+import org.apache.hadoop.hive.ql.metadata._
+import org.apache.hadoop.hive.ql.session.SessionState
+import org.apache.hadoop.hive.ql.processors._
+import org.apache.hadoop.hive.ql.Driver
+
+import org.apache.spark.Logging
+import org.apache.spark.sql.execution.QueryExecutionException
+
+
+/**
+ * A class that wraps the HiveClient and converts its responses to 
externally visible classes.
+ * Note that this class is typically loaded with an internal classloader 
for each instantiation,
+ * allowing it to interact directly with a specific isolated version of 
Hive.  Loading this class
+ * with the isolated classloader however will result in it only being 
visible as a ClientInterface,
+ * not a ClientWrapper.
+ *
+ * This class needs to interact with multiple versions of Hive, but will 
always be compiled with
+ * the 'native', execution version of Hive.  Therefore, any places where 
hive breaks compatibility
+ * must use reflection after matching on `version`.
+ *
+ * @param version the version of hive used when pick function calls that 
are not compatible.
+ * @param config  a collection of configuration options that will be added 
to the hive conf before
+ *opening the hive client.
+ */
+class ClientWrapper(
+version: HiveVersion,
+config: Map[String, String])
+  extends ClientInterface
+  with Logging
+  with ReflectionMagic {
+
+  private val conf = new HiveConf(classOf[SessionState])
+  config.foreach { case (k, v) =>
+logDebug(s"Hive Config: $k=$v")
+conf.set(k, v)
+  }
+
+  private def properties = Seq(
--- End diff --

is this used anywhere other than the foreach? if not, maybe just create a 
locally scoped variable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-05-01 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-98307502
  
@andrewor14 okay makes sense. Thanks for all the hard work on this andrew 
LGTM pending tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7243][SQL] Contingency Tables for DataF...

2015-05-01 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/5842#discussion_r29545291
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala 
---
@@ -77,4 +78,42 @@ private[sql] object StatFunctions {
   })
 counts.cov
   }
+
+  /** Generate a table of frequencies for the elements of two columns. */
+  private[sql] def crossTabulate(df: DataFrame, col1: String, col2: 
String): DataFrame = {
+val tableName = s"${col1}_$col2"
+val distinctCol2 = 
df.select(col2).distinct.collect().sortBy(_.get(0).toString)
--- End diff --

That's what I did first. Xiangrui thought this would be more efficient.

On Fri, May 1, 2015 at 10:16 PM, Reynold Xin 
wrote:

> In
> 
sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
> :
>
> > @@ -77,4 +78,42 @@ private[sql] object StatFunctions {
> >})
> >  counts.cov
> >}
> > +
> > +  /** Generate a table of frequencies for the elements of two columns. 
*/
> > +  private[sql] def crossTabulate(df: DataFrame, col1: String, col2: 
String): DataFrame = {
> > +val tableName = s"${col1}_$col2"
> > +val distinctCol2 = 
df.select(col2).distinct.collect().sortBy(_.get(0).toString)
>
> btw - isn't a more efficient way to run this is to do groupBy(col1,
> col2).count(), and then pivot the table?
>
> â
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-05-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-98307469
  
It turns out that in streaming we currently pass an RDD into 
`ClosureCleaner#clean` so we can't do the type safety check that you supported 
@pwendell. I will fix this separately later in SPARK-7318.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7243][SQL] Contingency Tables for DataF...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5842#discussion_r29545263
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala 
---
@@ -77,4 +78,42 @@ private[sql] object StatFunctions {
   })
 counts.cov
   }
+
+  /** Generate a table of frequencies for the elements of two columns. */
+  private[sql] def crossTabulate(df: DataFrame, col1: String, col2: 
String): DataFrame = {
+val tableName = s"${col1}_$col2"
+val distinctCol2 = 
df.select(col2).distinct.collect().sortBy(_.get(0).toString)
--- End diff --

that way we only need one pass over the data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7243][SQL] Contingency Tables for DataF...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5842#discussion_r29545262
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala 
---
@@ -77,4 +78,42 @@ private[sql] object StatFunctions {
   })
 counts.cov
   }
+
+  /** Generate a table of frequencies for the elements of two columns. */
+  private[sql] def crossTabulate(df: DataFrame, col1: String, col2: 
String): DataFrame = {
+val tableName = s"${col1}_$col2"
+val distinctCol2 = 
df.select(col2).distinct.collect().sortBy(_.get(0).toString)
--- End diff --

btw - isn't a more efficient way to run this is to do groupBy(col1, 
col2).count(), and then pivot the table?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-98307278
  
  [Test build #31654 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31654/consoleFull)
 for   PR 5685 at commit 
[`cd46230`](https://github.com/apache/spark/commit/cd4623006d0d30c1fcd66eb7c947eab4d201e43b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-98306661
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-98306615
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7241] Pearson correlation for DataFrame...

2015-05-01 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5858#issuecomment-98304079
  
I will let @mengxr comment on the math part.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7241] Pearson correlation for DataFrame...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5858#discussion_r29545209
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala ---
@@ -43,7 +43,15 @@ class DataFrameStatSuite extends FunSuite  {
 val singleColResults = df.stat.freqItems(Array("negDoubles"), 0.1)
 val items2 = singleColResults.collect().head
 items2.getSeq[Double](0) should contain (-1.0)
+  }
 
+  test("pearson correlation") {
+val df = sqlCtx.sparkContext.parallelize(
+  Array.tabulate(10)(i => (i, 2 * i, i * -1.0))).toDF("a", "b", "c")
--- End diff --

fyi we have implicits to add toDF on Seq[Tuples], so you can just replace 
Array with Seq, and then remove all the sparkContext.parallelize stuff. Maybe 
do it for the frequent items above also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5859#issuecomment-98303294
  
  [Test build #31653 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31653/consoleFull)
 for   PR 5859 at commit 
[`f9aa9ce`](https://github.com/apache/spark/commit/f9aa9ce35b121f94c8801498266bf5d46d234b19).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7243][SQL] Contingency Tables for DataF...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5842#issuecomment-98303042
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31647/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7241] Pearson correlation for DataFrame...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5858#discussion_r29545205
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -875,6 +875,25 @@ def fillna(self, value, subset=None):
 
 return DataFrame(self._jdf.na().fill(value, 
self._jseq(subset)), self.sql_ctx)
 
+def corr(self, col1, col2, method="pearson"):
+"""
+Calculate the correlation of two columns of a DataFrame as a 
double value. Currently only
--- End diff --

Calculates


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5859#issuecomment-98302936
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5859#issuecomment-98302973
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7243][SQL] Contingency Tables for DataF...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5842#issuecomment-98303037
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7243][SQL] Contingency Tables for DataF...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5842#issuecomment-98302968
  
  [Test build #31647 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31647/consoleFull)
 for   PR 5842 at commit 
[`6805df8`](https://github.com/apache/spark/commit/6805df8e34cfcc7520cbda30d6660f93045c948c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7241] Pearson correlation for DataFrame...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5858#discussion_r29545190
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -28,6 +28,32 @@ import org.apache.spark.sql.execution.stat._
 final class DataFrameStatFunctions private[sql](df: DataFrame) {
 
   /**
+   * Calculates the correlation of two columns of a DataFrame. Currently 
only supports the Pearson
+   * Correlation Coefficient. For Spearman Correlation, consider using RDD 
methods found in 
+   * MLlib's Statistics.
+   *
+   * @param col1 the name of the column
+   * @param col2 the name of the column to calculate the correlation 
against
+   * @return The Pearson Correlation Coefficient as a Double.
+   */
+  def corr(col1: String, col2: String, method: String): Double = {
+assert(method == "pearson", "Currently only the calculation of the 
Pearson Correlation " +
--- End diff --

require. 

assert can be turned off ... and assert should only be used to check 
internal invariants.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7241] Pearson correlation for DataFrame...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5858#discussion_r29545194
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -875,6 +875,25 @@ def fillna(self, value, subset=None):
 
 return DataFrame(self._jdf.na().fill(value, 
self._jseq(subset)), self.sql_ctx)
 
+def corr(self, col1, col2, method="pearson"):
--- End diff --

similar to the other PR, might be better to make method=None here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545186
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientInterface.scala 
---
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException
+
+case class HiveDatabase(
+name: String,
+location: String)
+
+abstract class TableType { val name: String }
+case object ExternalTable extends TableType { override val name = 
"EXTERNAL_TABLE" }
+case object IndexTable extends TableType { override val name = 
"INDEX_TABLE" }
+case object ManagedTable extends TableType { override val name = 
"MANAGED_TABLE" }
+case object VirtualView extends TableType { override val name = 
"VIRTUAL_VIEW" }
+
+case class HiveStorageDescriptor(
+location: String,
+inputFormat: String,
+outputFormat: String,
+serde: String)
+
+case class HivePartition(
+values: Seq[String],
+storage: HiveStorageDescriptor)
+
+case class HiveColumn(name: String, hiveType: String, comment: String)
+case class HiveTable(
+specifiedDatabase: Option[String],
+name: String,
+schema: Seq[HiveColumn],
+partitionColumns: Seq[HiveColumn],
+properties: Map[String, String],
+serdeProperties: Map[String, String],
+tableType: TableType,
+location: Option[String] = None,
+inputFormat: Option[String] = None,
+outputFormat: Option[String] = None,
+serde: Option[String] = None) {
+
+  @transient
+  private[client] var client: ClientInterface = _
+
+  private[client] def withClient(ci: ClientInterface): this.type = {
+client = ci
+this
+  }
+
+  def database: String = specifiedDatabase.getOrElse(sys.error("database 
not resolved"))
+
+  def isPartitioned: Boolean = partitionColumns.nonEmpty
+
+  def getAllPartitions: Seq[HivePartition] = client.getAllPartitions(this)
+
+  // Hive does not support backticks when passing names to the client.
+  def qualifiedName: String = s"$database.$name"
+}
+
+/**
+ * An externally visible interface to the Hive client.  This interface is 
shared across both the
+ * internal and external classloaders for a given version of Hive and thus 
must expose only
+ * shared classes.
+ */
+trait ClientInterface {
+  /**
+   * Runs a HiveQL command using Hive, returning the results as a list of 
strings.  Each row will
+   * result in one string.
+   */
+  def runSqlHive(sql: String): Seq[String]
+
+  /** Returns the names of all tables in the given database. */
+  def listTables(dbName: String): Seq[String]
+
+  /** Returns the name of the active database. */
+  def currentDatabase: String
+
+  /** Returns the metadata for specified database, throwing an exception 
if it doesn't exist */
+  def getDatabase(name: String): HiveDatabase = {
+getDatabaseOption(name).getOrElse(sys.error(s"No such database $name"))
+  }
+
+  /** Returns the metadata for a given database, or None if it doesn't 
exist. */
+  def getDatabaseOption(name: String): Option[HiveDatabase]
+
+  /** Returns the specified table, or throws [[NoSuchTableException]]. */
+  def getTable(dbName: String, tableName: String): HiveTable = {
+getTableOption(dbName, tableName).getOrElse(throw new 
NoSuchTableException)
+  }
+
+  /** Returns the metadata for the specified table or None if it doens't 
exist. */
+  def getTableOption(dbName: String, tableName: String): Option[HiveTable]
+
+  /** Creates a table with the given metadata. */
+  def createTable(table: HiveTable): Unit
+
+  /** Updates the given table with new metadata. */
+  def alterTable(table: HiveTable): Unit
+
+  /** Creates a new database with the given name. */
+

[GitHub] spark pull request: [SPARK-6907][SQL] Isolated client for HiveMeta...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5851#discussion_r29545181
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientInterface.scala 
---
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.client
+
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException
+
+case class HiveDatabase(
+name: String,
+location: String)
+
+abstract class TableType { val name: String }
+case object ExternalTable extends TableType { override val name = 
"EXTERNAL_TABLE" }
+case object IndexTable extends TableType { override val name = 
"INDEX_TABLE" }
+case object ManagedTable extends TableType { override val name = 
"MANAGED_TABLE" }
+case object VirtualView extends TableType { override val name = 
"VIRTUAL_VIEW" }
+
+case class HiveStorageDescriptor(
+location: String,
+inputFormat: String,
+outputFormat: String,
+serde: String)
+
+case class HivePartition(
+values: Seq[String],
+storage: HiveStorageDescriptor)
+
+case class HiveColumn(name: String, hiveType: String, comment: String)
+case class HiveTable(
+specifiedDatabase: Option[String],
+name: String,
+schema: Seq[HiveColumn],
+partitionColumns: Seq[HiveColumn],
+properties: Map[String, String],
+serdeProperties: Map[String, String],
+tableType: TableType,
+location: Option[String] = None,
+inputFormat: Option[String] = None,
+outputFormat: Option[String] = None,
+serde: Option[String] = None) {
+
+  @transient
+  private[client] var client: ClientInterface = _
+
+  private[client] def withClient(ci: ClientInterface): this.type = {
+client = ci
+this
+  }
+
+  def database: String = specifiedDatabase.getOrElse(sys.error("database 
not resolved"))
+
+  def isPartitioned: Boolean = partitionColumns.nonEmpty
+
+  def getAllPartitions: Seq[HivePartition] = client.getAllPartitions(this)
+
+  // Hive does not support backticks when passing names to the client.
+  def qualifiedName: String = s"$database.$name"
+}
+
+/**
+ * An externally visible interface to the Hive client.  This interface is 
shared across both the
+ * internal and external classloaders for a given version of Hive and thus 
must expose only
+ * shared classes.
+ */
+trait ClientInterface {
+  /**
+   * Runs a HiveQL command using Hive, returning the results as a list of 
strings.  Each row will
+   * result in one string.
+   */
+  def runSqlHive(sql: String): Seq[String]
+
+  /** Returns the names of all tables in the given database. */
+  def listTables(dbName: String): Seq[String]
+
+  /** Returns the name of the active database. */
+  def currentDatabase: String
+
+  /** Returns the metadata for specified database, throwing an exception 
if it doesn't exist */
+  def getDatabase(name: String): HiveDatabase = {
+getDatabaseOption(name).getOrElse(sys.error(s"No such database $name"))
--- End diff --

should we create a NoSuchDatabaseException and use it here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7241] Pearson correlation for DataFrame...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5858#issuecomment-98302301
  
  [Test build #31652 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31652/consoleFull)
 for   PR 5858 at commit 
[`4fe693b`](https://github.com/apache/spark/commit/4fe693b6d1bdbe31af58e18de1c4d575b559db28).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7241] Pearson correlation for DataFrame...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5858#issuecomment-98302295
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7241] Pearson correlation for DataFrame...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5858#issuecomment-98302293
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7241] Pearson correlation for DataFrame...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5858#issuecomment-98302115
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31646/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7241] Pearson correlation for DataFrame...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5858#issuecomment-98302114
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7241] Pearson correlation for DataFrame...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5858#issuecomment-98302103
  
  [Test build #31646 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31646/consoleFull)
 for   PR 5858 at commit 
[`a682d06`](https://github.com/apache/spark/commit/a682d06a99fc78eabc98b88a7f52f838a5b0811b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5859#discussion_r29545101
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1344,6 +1362,11 @@ def cov(self, col1, col2):
 
 cov.__doc__ = DataFrame.cov.__doc__
 
+def freqItems(self, cols, support=1e-2):
--- End diff --

here -- if support=None by default, we only need to set the default value 
in one place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5859#discussion_r29545099
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -889,6 +889,24 @@ def cov(self, col1, col2):
 raise ValueError("col2 should be a string.")
 return self._jdf.stat().cov(col1, col2)
 
+def freqItems(self, cols, support=1e-2):
--- End diff --

it's probably best to use None as the default, and set support to 1e-2 if 
it is None in the body of the function. Then you don't need to set the default 
value in two places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5859#discussion_r29545089
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -889,6 +889,24 @@ def cov(self, col1, col2):
 raise ValueError("col2 should be a string.")
 return self._jdf.stat().cov(col1, col2)
 
+def freqItems(self, cols, support=1e-2):
+"""
+Finding frequent items for columns, possibly with false positives. 
Using the
+frequent element count algorithm described in
+"http://dx.doi.org/10.1145/762471.762473, proposed by Karp, 
Schenker, and Papadimitriou".
+Uses a `default` support of 1%. The support must be greater than 
1e-4.
+:func:`DataFrame.freqItems` and 
:func:`DataFrameStatFunctions.freqItems` are aliases.
+
+:param cols: Names of the columns to calculate frequent items for 
as a list or tuple of
+strings.
+:param support: The frequency with which to consider an item 
'frequent'. Default is 1%.
--- End diff --

Move the default value and default support here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-7300 remove temporary directories after ...

2015-05-01 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/5834#issuecomment-98301075
  
Please change the tile of PR to follow the paradigm like others :).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7317] [Shuffle] Expose shuffle handle

2015-05-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5857


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5859#issuecomment-98300894
  
  [Test build #31651 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31651/consoleFull)
 for   PR 5859 at commit 
[`4b25056`](https://github.com/apache/spark/commit/4b25056dd12cc6b2bc8b0cf68d3573c592943053).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7317] [Shuffle] Expose shuffle handle

2015-05-01 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5857#issuecomment-98300892
  
LGTM. Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5859#issuecomment-98300854
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5859#issuecomment-98300847
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7242] added python api for freqItems in...

2015-05-01 Thread brkyvz

GitHub user brkyvz opened a pull request:

https://github.com/apache/spark/pull/5859

[SPARK-7242] added python api for freqItems in DataFrames

The python api for DataFrame's plus addressed your comments from previous 
PR.
@rxin 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brkyvz/spark df-freq-py2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5859.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5859


commit 4b25056dd12cc6b2bc8b0cf68d3573c592943053
Author: Burak Yavuz 
Date:   2015-05-02T04:20:07Z

added python api for freqItems




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1260 matches

Mail list logo