date:20160912

[GitHub] spark issue #13617: [SPARK-10409] [ML] Add Multilayer Perceptron Regression ...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13617
  
**[Test build #65278 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65278/consoleFull)**
 for PR 13617 at commit 
[`509cb23`](https://github.com/apache/spark/commit/509cb23ef66238d17763d8aac320cf2812ee0f3d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14961
  
**[Test build #65264 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65264/consoleFull)**
 for PR 14961 at commit 
[`502ebf4`](https://github.com/apache/spark/commit/502ebf45f4fa9791cbf26ec5ea7e0167ecbc68a0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14961
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65264/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14961
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14842: [SPARK-10747][SQL] Support NULLS FIRST|LAST clause in OR...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14842
  
**[Test build #65279 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65279/consoleFull)**
 for PR 14842 at commit 
[`5153ce5`](https://github.com/apache/spark/commit/5153ce5261752cf6c33b8f48759de495ed8890c3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15036: [SPARK-17483] Refactoring in BlockManager status reporti...

2016-09-12 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/15036
  
Thanks for the reviews. I'm going to merge this into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15036: [SPARK-17483] Refactoring in BlockManager status ...

2016-09-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15036


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15062: SPARK-17424: Fix unsound substitution bug in ScalaReflec...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15062
  
**[Test build #65269 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65269/consoleFull)**
 for PR 15062 at commit 
[`931f156`](https://github.com/apache/spark/commit/931f156450da83f82bddc4356fb14babd56ec625).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15062: SPARK-17424: Fix unsound substitution bug in ScalaReflec...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15062
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15062: SPARK-17424: Fix unsound substitution bug in ScalaReflec...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15062
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65269/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15037: [SPARK-17485] Prevent failed remote reads of cached bloc...

2016-09-12 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/15037
  
This LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13617: [SPARK-10409] [ML] Add Multilayer Perceptron Regression ...

2016-09-12 Thread JeremyNixon

Github user JeremyNixon commented on the issue:

https://github.com/apache/spark/pull/13617
  
@avulanov I am interested - how about I replicate this PR at 
github.com/avulanov/scalable-deeplearning and we discuss details there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15059: [SPARK-17506][SQL] Improve the check double values equal...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15059
  
**[Test build #65265 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65265/consoleFull)**
 for PR 15059 at commit 
[`78f3733`](https://github.com/apache/spark/commit/78f37334164a015605d5c23ff7217a131c3ea3a7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15059: [SPARK-17506][SQL] Improve the check double values equal...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15059
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15059: [SPARK-17506][SQL] Improve the check double values equal...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15059
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65265/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15063
  
**[Test build #65270 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65270/consoleFull)**
 for PR 15063 at commit 
[`4b8c277`](https://github.com/apache/spark/commit/4b8c277ebb1c8ff966e6d3c2676dfc34a1f0c483).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15063
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65270/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15063
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13642: [MINOR] Clean up several build warnings, mostly d...

2016-09-12 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/13642#discussion_r78450919
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -17,48 +17,18 @@
 
 package org.apache.spark.sql.execution.metric
 
-import java.io.{ByteArrayInputStream, ByteArrayOutputStream}
-
-import scala.collection.mutable
-
-import org.apache.xbean.asm5._
-import org.apache.xbean.asm5.Opcodes._
-
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql._
 import org.apache.spark.sql.execution.SparkPlanInfo
 import org.apache.spark.sql.execution.ui.SparkPlanGraph
 import org.apache.spark.sql.functions._
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSQLContext
-import org.apache.spark.util.{AccumulatorContext, JsonProtocol, Utils}
-
+import org.apache.spark.util.{AccumulatorContext, JsonProtocol}
 
 class SQLMetricsSuite extends SparkFunSuite with SharedSQLContext {
   import testImplicits._
 
-  test("SQLMetric should not box Long") {
--- End diff --

Why remove this test? This test doesn't use the old accumulator API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15063
  
**[Test build #3255 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3255/consoleFull)**
 for PR 15063 at commit 
[`21b6b4d`](https://github.com/apache/spark/commit/21b6b4dbb9cbc977c6e4aa8527532b3e933bf7c2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SetAccumulator[T] extends AccumulatorV2[T, java.util.Set[T]] 
`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13617: [SPARK-10409] [ML] Add Multilayer Perceptron Regression ...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13617
  
**[Test build #65278 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65278/consoleFull)**
 for PR 13617 at commit 
[`509cb23`](https://github.com/apache/spark/commit/509cb23ef66238d17763d8aac320cf2812ee0f3d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13617: [SPARK-10409] [ML] Add Multilayer Perceptron Regression ...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13617
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65278/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13617: [SPARK-10409] [ML] Add Multilayer Perceptron Regression ...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13617
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14959: [SPARK-17387][PYSPARK] Creating SparkContext() fr...

2016-09-12 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14959#discussion_r78453870
  
--- Diff: python/pyspark/java_gateway.py ---
@@ -51,13 +51,16 @@ def launch_gateway():
 on_windows = platform.system() == "Windows"
 script = "./bin/spark-submit.cmd" if on_windows else 
"./bin/spark-submit"
 submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", 
"pyspark-shell")
+if conf and conf.getAll():
+submit_args = ' '.join(['--conf %s="%s"' % (k, v) for k, v in 
conf.getAll()]) \
--- End diff --

I think it would be better to create a separate list for the `--conf` 
arguments, and concatenate it into the `command` variable below. Then you don't 
need to worry about the values of the arguments causing trouble.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15063
  
**[Test build #65272 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65272/consoleFull)**
 for PR 15063 at commit 
[`5a7183b`](https://github.com/apache/spark/commit/5a7183b8281dd4ced90c4c9d522c96cc6d8e9fb3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15063
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65272/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15063
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...

2016-09-12 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/14959
  
The conf code looks kinda nasty with the checks for whether a JVM has been 
set or not... I guess part of it is mandatory because otherwise this wouldn't 
work, but in particular, I'm not so sure the `_set_jvm` code is necessary.

If you just say `self._conf = SparkConf(_jvm=self._jvm)` in `SparkContext`, 
it should maintain the current behavior. Especially since the Scala 
`SparkContext` clones the original user config - and if I read your code 
correctly, you're not doing that here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...

2016-09-12 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/14959
  
Also someone else more familiar with pyspark (I know Holden has already 
looked), maybe @davies, should take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15061: [SPARK-14818] Post-2.0 MiMa exclusion and build changes

2016-09-12 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/15061
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13758
  
**[Test build #65277 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65277/consoleFull)**
 for PR 13758 at commit 
[`45ae9bf`](https://github.com/apache/spark/commit/45ae9bf8295acad26b3b017a9653533843ec39b7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13758
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65277/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15064: [SPARK-17509]]When wrapping catalyst datatype to Hive da...

2016-09-12 Thread sitalkedia

Github user sitalkedia commented on the issue:

https://github.com/apache/spark/pull/15064
  
cc - @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15064: [SPARK-17509]]When wrapping catalyst datatype to ...

2016-09-12 Thread sitalkedia

GitHub user sitalkedia opened a pull request:

https://github.com/apache/spark/pull/15064

[SPARK-17509]]When wrapping catalyst datatype to Hive data type avoidâ¦

## What changes were proposed in this pull request?

When wrapping catalyst datatypes to Hive data type, wrap function was doing 
an expensive pattern matching which was consuming around 11% of cpu time. Avoid 
the pattern matching by returning the wrapper only once and reuse it.

## How was this patch tested?

Tested by running the job on cluster and saw around 8% cpu improvements.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sitalkedia/spark skedia/hive_wrapper

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15064.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15064


commit 19a2d96c4be9af363c2f5deb54e4a83b541a03f3
Author: Sital Kedia 
Date:   2016-09-12T18:57:53Z

[SPARK-17509]]When wrapping catalyst datatype to Hive data type avoid 
pattern matching




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15030: [SPARK-17474] [SQL] fix python udf in TakeOrderedAndProj...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15030
  
**[Test build #65273 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65273/consoleFull)**
 for PR 15030 at commit 
[`1e319d8`](https://github.com/apache/spark/commit/1e319d8f4ef1adf69b4fffa928bc1ac0c0f21805).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13758
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15064: [SPARK-17509]]When wrapping catalyst datatype to Hive da...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15064
  
**[Test build #65280 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65280/consoleFull)**
 for PR 15064 at commit 
[`19a2d96`](https://github.com/apache/spark/commit/19a2d96c4be9af363c2f5deb54e4a83b541a03f3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15030: [SPARK-17474] [SQL] fix python udf in TakeOrderedAndProj...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15030
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15030: [SPARK-17474] [SQL] fix python udf in TakeOrderedAndProj...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15030
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65273/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15064: [SPARK-17509]]When wrapping catalyst datatype to Hive da...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15064
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65280/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15064: [SPARK-17509]]When wrapping catalyst datatype to Hive da...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15064
  
**[Test build #65280 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65280/consoleFull)**
 for PR 15064 at commit 
[`19a2d96`](https://github.com/apache/spark/commit/19a2d96c4be9af363c2f5deb54e4a83b541a03f3).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15064: [SPARK-17509]]When wrapping catalyst datatype to Hive da...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15064
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13617: [SPARK-10409] [ML] Add Multilayer Perceptron Regression ...

2016-09-12 Thread avulanov

Github user avulanov commented on the issue:

https://github.com/apache/spark/pull/13617
  
@JeremyNixon Sounds good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12601
  
**[Test build #65275 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65275/consoleFull)**
 for PR 12601 at commit 
[`7ef7a48`](https://github.com/apache/spark/commit/7ef7a489b27fa6bd5d79ee4d428874162fd813de).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12601
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12601
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65275/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15063
  
**[Test build #65281 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65281/consoleFull)**
 for PR 15063 at commit 
[`0da2f9b`](https://github.com/apache/spark/commit/0da2f9b19769c6ff1a93a35ce3bc16b28676c930).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15048: [SPARK-17409] [SQL] Do Not Optimize Query in CTAS...

2016-09-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15048#discussion_r78461431
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -68,7 +68,7 @@ class ResolveDataSource(sparkSession: SparkSession) 
extends Rule[LogicalPlan] {
 /**
  * Preprocess some DDL plans, e.g. [[CreateTable]], to do some 
normalization and checking.
--- End diff --

Sure, let me do it now. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #65276 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65276/consoleFull)**
 for PR 11105 at commit 
[`491499d`](https://github.com/apache/spark/commit/491499d34e8481cfb9ef43a8871b52bbee3f4638).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulato...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14467
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65274/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulato...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14467
  
**[Test build #65274 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65274/consoleFull)**
 for PR 14467 at commit 
[`6169c3c`](https://github.com/apache/spark/commit/6169c3c6ad0e17566467876edc43898e668037ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65276/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulato...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14467
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15065: [SPARK-17463][Core]Add necessary memory barrier f...

2016-09-12 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/15065

[SPARK-17463][Core]Add necessary memory barrier for accumulators

## What changes were proposed in this pull request?

Added `volatile` for fields that will be read in the heartbeat thread. 
Without them, the worse case is the user cannot see any metric updates until a 
task finishes.

Unfortunately, there will be a performance regression comparing to Spark 
2.0.0. Note: in Spark 1.6, accumulators have a volatile value, so there should 
not be any performance regression comparing to 1.6.

To reduce the performance lost caused by `volatile`, there are two 
alternative ways:

1. Use `AtomicXXX.lazySet` to update a metric value. The eventually set 
should be enough for metrics. There are some comparison numbers in 
http://psy-lob-saw.blogspot.com/2012/12/atomiclazyset-is-performance-win-for.html
 . However, this will increase the metric object size a lot.

2. Use `AtomicLongFieldUpdater` to avoid the overhead of AtomicLong. See: 
http://normanmaurer.me/blog/2013/10/28/Lesser-known-concurrent-classes-Part-1/
 . The problem of this approach is we need to rewrite codes using Java 
because we cannot add static codes in a class in Scala. The following codes 
will fail with `java.lang.IllegalAccessException: Class Foo$ can not access a 
member of class Foo with modifiers "private volatile"`:

```Scala
import java.util.concurrent.atomic.AtomicLongFieldUpdater

class Foo {
  @volatile var x: Long = 0
}

object Foo extends App {
  classOf[Foo].getDeclaredField("x").setAccessible(true) // newUpdater 
doesn't use this
  val field =  AtomicLongFieldUpdater.newUpdater(classOf[Foo], "x")
}
```


## How was this patch tested?

Jenkins tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark fix-accmu

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15065.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15065


commit 4c6bb0b92b55ca5368b2e2752704fc197f9be4ad
Author: Shixiong Zhu 
Date:   2016-09-12T20:51:53Z

Add necessary memory barrier for accumulators




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15065: [SPARK-17463][Core]Add necessary memory barrier for accu...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15065
  
**[Test build #65282 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65282/consoleFull)**
 for PR 15065 at commit 
[`4c6bb0b`](https://github.com/apache/spark/commit/4c6bb0b92b55ca5368b2e2752704fc197f9be4ad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15048: [SPARK-17409] [SQL] Do Not Optimize Query in CTAS More T...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15048
  
**[Test build #65283 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65283/consoleFull)**
 for PR 15048 at commit 
[`4c3c955`](https://github.com/apache/spark/commit/4c3c955cb71ad00b65d74248d6247221f8afaf42).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15026: [SPARK-17472] [PYSPARK] Better error message for ...

2016-09-12 Thread Stibbons

Github user Stibbons commented on a diff in the pull request:

https://github.com/apache/spark/pull/15026#discussion_r78464418
  
--- Diff: python/pyspark/broadcast.py ---
@@ -75,7 +75,13 @@ def __init__(self, sc=None, value=None, 
pickle_registry=None, path=None):
 self._path = path
 
 def dump(self, value, f):
-pickle.dump(value, f, 2)
+try:
+pickle.dump(value, f, 2)
+except pickle.PickleError:
+raise
+except Exception as e:
+msg = "Could not serialize broadcast: " + e.__class__.__name__ 
+ ": " + e.message
+raise pickle.PicklingError(msg)
--- End diff --

it is indeed a good practice to log when wrapping exception is more 
contextual one. logging.exception(msg) will be perfect


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r78464438
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -311,8 +350,28 @@ class LogisticRegression @Since("1.2.0") (
 
 val histogram = labelSummarizer.histogram
 val numInvalid = labelSummarizer.countInvalid
-val numClasses = histogram.length
 val numFeatures = summarizer.mean.size
+val numFeaturesPlusIntercept = if (getFitIntercept) numFeatures + 1 
else numFeatures
+
+val numClasses = 
MetadataUtils.getNumClasses(dataset.schema($(labelCol))) match {
+  case Some(n: Int) =>
+require(n >= histogram.length, s"Specified number of classes $n 
was " +
+  s"less than the number of unique labels ${histogram.length}.")
+n
+  case None => histogram.length
+}
+
+val isBinaryClassification = numClasses == 1 || numClasses == 2
+val isMultinomial = $(family) match {
+  case "binomial" =>
+require(isBinaryClassification, s"Binomial family only supports 1 
or 2 " +
+s"outcome classes but found $numClasses.")
+false
+  case "multinomial" => true
+  case "auto" => !isBinaryClassification
+  case other => throw new IllegalArgumentException(s"Unsupported 
family: $other")
+}
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r78464517
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -460,33 +577,74 @@ class LogisticRegression @Since("1.2.0") (
as a result, no scaling is needed.
  */
 val rawCoefficients = state.x.toArray.clone()
-var i = 0
-while (i < numFeatures) {
-  rawCoefficients(i) *= { if (featuresStd(i) != 0.0) 1.0 / 
featuresStd(i) else 0.0 }
-  i += 1
+val coefficientArray = Array.tabulate(numCoefficientSets * 
numFeatures) { i =>
+  // flatIndex will loop though rawCoefficients, and skip the 
intercept terms.
+  val flatIndex = if ($(fitIntercept)) i + i / numFeatures else i
+  val featureIndex = i % numFeatures
+  if (featuresStd(featureIndex) != 0.0) {
+rawCoefficients(flatIndex) / featuresStd(featureIndex)
+  } else {
+0.0
+  }
+}
+val coefficientMatrix =
+  new DenseMatrix(numCoefficientSets, numFeatures, 
coefficientArray, isTransposed = true)
+
+if ($(regParam) == 0.0 && isMultinomial) {
+  /*
+When no regularization is applied, the coefficients lack 
identifiability because
+we do not use a pivot class. We can add any constant value to 
the coefficients and
+get the same likelihood. So here, we choose the mean centered 
coefficients for
+reproducibility. This method follows the approach in glmnet, 
described here:
+
+Friedman, et al. "Regularization Paths for Generalized Linear 
Models via
+  Coordinate Descent," 
https://core.ac.uk/download/files/153/6287975.pdf
+   */
+  val coefficientMean = coefficientMatrix.values.sum / 
coefficientMatrix.values.length
+  coefficientMatrix.update(_ - coefficientMean)
 }
-bcFeaturesStd.destroy(blocking = false)
 
-if ($(fitIntercept)) {
-  (Vectors.dense(rawCoefficients.dropRight(1)).compressed, 
rawCoefficients.last,
-arrayBuilder.result())
+val interceptsArray: Array[Double] = if ($(fitIntercept)) {
+  Array.tabulate(numCoefficientSets) { i =>
+val coefIndex = (i + 1) * numFeaturesPlusIntercept - 1
+rawCoefficients(coefIndex)
+  }
+} else {
+  Array[Double]()
+}
+/*
+  The intercepts are never regularized, so we always center the 
mean.
+ */
+val interceptVector = if (interceptsArray.nonEmpty && 
isMultinomial) {
+  val interceptMean = interceptsArray.sum / numClasses
+  interceptsArray.indices.foreach { i => interceptsArray(i) -= 
interceptMean }
+  Vectors.dense(interceptsArray)
+} else if (interceptsArray.length == 1) {
+  Vectors.dense(interceptsArray)
 } else {
-  (Vectors.dense(rawCoefficients).compressed, 0.0, 
arrayBuilder.result())
+  Vectors.sparse(numCoefficientSets, Seq())
 }
+(coefficientMatrix, interceptVector, arrayBuilder.result())
--- End diff --

I implemented this logic, good suggestion. Let me know if that's what you 
had in mind.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r78464458
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -323,32 +382,33 @@ class LogisticRegression @Since("1.2.0") (
 instr.logNumClasses(numClasses)
 instr.logNumFeatures(numFeatures)
 
-val (coefficients, intercept, objectiveHistory) = {
+val (coefficientMatrix, interceptVector, objectiveHistory) = {
   if (numInvalid != 0) {
 val msg = s"Classification labels should be in [0 to ${numClasses 
- 1}]. " +
   s"Found $numInvalid invalid labels."
 logError(msg)
 throw new SparkException(msg)
   }
 
-  val isConstantLabel = histogram.count(_ != 0) == 1
+  val isConstantLabel = histogram.count(_ != 0.0) == 1
 
-  if (numClasses > 2) {
-val msg = s"LogisticRegression with ElasticNet in ML package only 
supports " +
-  s"binary classification. Found $numClasses in the input dataset. 
Consider using " +
-  s"MultinomialLogisticRegression instead."
-logError(msg)
-throw new SparkException(msg)
-  } else if ($(fitIntercept) && numClasses == 2 && isConstantLabel) {
-logWarning(s"All labels are one and fitIntercept=true, so the 
coefficients will be " +
-  s"zeros and the intercept will be positive infinity; as a 
result, " +
-  s"training is not needed.")
-(Vectors.sparse(numFeatures, Seq()), Double.PositiveInfinity, 
Array.empty[Double])
-  } else if ($(fitIntercept) && numClasses == 1) {
-logWarning(s"All labels are zero and fitIntercept=true, so the 
coefficients will be " +
-  s"zeros and the intercept will be negative infinity; as a 
result, " +
-  s"training is not needed.")
-(Vectors.sparse(numFeatures, Seq()), Double.NegativeInfinity, 
Array.empty[Double])
+  if ($(fitIntercept) && isConstantLabel) {
+logWarning(s"All labels are the same value and fitIntercept=true, 
so the coefficients " +
+  s"will be zeros. Training is not needed.")
+val constantLabelIndex = Vectors.dense(histogram).argmax
+val coefMatrix = if (numFeatures < numCoefficientSets) {
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14834: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14834
  
**[Test build #65284 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65284/consoleFull)**
 for PR 14834 at commit 
[`d977768`](https://github.com/apache/spark/commit/d977768715f244da621c2e50f8b0eddfdbd646fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15061: [SPARK-14818] Post-2.0 MiMa exclusion and build changes

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15061
  
**[Test build #65268 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65268/consoleFull)**
 for PR 15061 at commit 
[`1224e75`](https://github.com/apache/spark/commit/1224e758fc4cf69e27f013615d52b5c96696506b).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15061: [SPARK-14818] Post-2.0 MiMa exclusion and build changes

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15061
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65268/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15061: [SPARK-14818] Post-2.0 MiMa exclusion and build changes

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15061
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15061: [SPARK-14818] Post-2.0 MiMa exclusion and build changes

2016-09-12 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/15061
  
I think that the test timeout is unrelated (since it occurred in PySpark 
tests and those are unaffected by changes to MiMa excludes), so I'm going to 
merge this now and will cherry-pick to branch-2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14842: [SPARK-10747][SQL] Support NULLS FIRST|LAST clause in OR...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14842
  
**[Test build #65279 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65279/consoleFull)**
 for PR 14842 at commit 
[`5153ce5`](https://github.com/apache/spark/commit/5153ce5261752cf6c33b8f48759de495ed8890c3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14842: [SPARK-10747][SQL] Support NULLS FIRST|LAST clause in OR...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14842
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14842: [SPARK-10747][SQL] Support NULLS FIRST|LAST clause in OR...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14842
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65279/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15061: [SPARK-14818] Post-2.0 MiMa exclusion and build c...

2016-09-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15061


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/15063
  
Jenkins, retest this please.

(Retesting so MiMa can run again)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15035
  
Hm.. are you sure this is a problem in all data sources? IIUC, JSON and CSV 
kind of allows permissive upcasting whereas ORC and Parquet do not - so this 
would be rather ORC and Parquet specific problems. Could you confirm if this 
happens in other datasources please?

Also, I believe this then will generally downgrade the performanxe in 
`SpecificMutableRow`. I wonder if it is worth doing this to support this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14750: [SPARK-17183][SQL] put hive serde table schema to table ...

2016-09-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14750
  
LGTM, except one minor comment 
https://github.com/apache/spark/pull/14750#discussion_r78106864. That comment 
does not affect the existing code.

https://github.com/apache/spark/pull/15024 is changing the same code path. 
How about merging this at first? Otherwise, the code changes might conflict 
with each other. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78468093
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 ---
@@ -195,18 +195,31 @@ class InMemoryCatalog(
 throw new TableAlreadyExistsException(db = db, table = table)
   }
 } else {
-  if (tableDefinition.tableType == CatalogTableType.MANAGED) {
-val dir = new Path(catalog(db).db.locationUri, table)
-try {
-  val fs = dir.getFileSystem(hadoopConfig)
-  fs.mkdirs(dir)
-} catch {
-  case e: IOException =>
-throw new SparkException(s"Unable to create table $table as 
failed " +
-  s"to create its directory $dir", e)
+  val tableWithLocation = if (tableDefinition.tableType == 
CatalogTableType.MANAGED) {
+val defaultTableLocation = new Path(catalog(db).db.locationUri, 
table)
+// Ideally we can not create a managed table with location, but 
due to some limitations in
+// [[CreateDataSourceTableAsSelectCommand]], we have to create the 
table directory and
+// write out data before we create this table. We should handle 
this case and allow the
+// table location to be pre-created, as long as it's same with the 
default table location.
+if (tableDefinition.storage.locationUri.isDefined) {
+  val givenTableLocation = new 
Path(tableDefinition.storage.locationUri.get).toUri.toString
+  require(defaultTableLocation.toUri.toString == 
givenTableLocation)
--- End diff --

Can we unify the if and else branch? Should we also check whether 
defaultTableLocation directory path is created if 
tableDefinition.storage.locationUri is set?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/13513
  
Just noticed that `FileStreamSource.getBatch(start: Option[Offset], end: 
Offset)` is broken in this PR. `start` could be an arbitrary offset.

I think we need to store `batchId` with its file paths together in the 
metadata log. `FileStreamSource.getBatch(start: Option[Offset], end: Offset)` 
could be very slow when all batches are in the same file because we need to 
parse the whole file to get the mapping from `batchId` to `files`. However, in 
most cases, `FileStreamSource.getBatch` only queries the latest batch, so if we 
don't compact the latest metadata file, we can make it pretty fast by reading 
one small file for most of cases. When recovering from failure, the performance 
of `FileStreamSource.getBatch` doesn't really matter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78468181
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 ---
@@ -195,18 +195,31 @@ class InMemoryCatalog(
 throw new TableAlreadyExistsException(db = db, table = table)
   }
 } else {
-  if (tableDefinition.tableType == CatalogTableType.MANAGED) {
-val dir = new Path(catalog(db).db.locationUri, table)
-try {
-  val fs = dir.getFileSystem(hadoopConfig)
-  fs.mkdirs(dir)
-} catch {
-  case e: IOException =>
-throw new SparkException(s"Unable to create table $table as 
failed " +
-  s"to create its directory $dir", e)
+  val tableWithLocation = if (tableDefinition.tableType == 
CatalogTableType.MANAGED) {
+val defaultTableLocation = new Path(catalog(db).db.locationUri, 
table)
+// Ideally we can not create a managed table with location, but 
due to some limitations in
+// [[CreateDataSourceTableAsSelectCommand]], we have to create the 
table directory and
+// write out data before we create this table. We should handle 
this case and allow the
+// table location to be pre-created, as long as it's same with the 
default table location.
+if (tableDefinition.storage.locationUri.isDefined) {
+  val givenTableLocation = new 
Path(tableDefinition.storage.locationUri.get).toUri.toString
+  require(defaultTableLocation.toUri.toString == 
givenTableLocation)
--- End diff --

`require(defaultTableLocation.toUri.toString == givenTableLocation)` 
doesn't give a clear user-facing message. Should we replace it with an explicit 
Exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78468230
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 ---
@@ -195,18 +195,31 @@ class InMemoryCatalog(
 throw new TableAlreadyExistsException(db = db, table = table)
   }
 } else {
-  if (tableDefinition.tableType == CatalogTableType.MANAGED) {
-val dir = new Path(catalog(db).db.locationUri, table)
-try {
-  val fs = dir.getFileSystem(hadoopConfig)
-  fs.mkdirs(dir)
-} catch {
-  case e: IOException =>
-throw new SparkException(s"Unable to create table $table as 
failed " +
-  s"to create its directory $dir", e)
+  val tableWithLocation = if (tableDefinition.tableType == 
CatalogTableType.MANAGED) {
+val defaultTableLocation = new Path(catalog(db).db.locationUri, 
table)
+// Ideally we can not create a managed table with location, but 
due to some limitations in
+// [[CreateDataSourceTableAsSelectCommand]], we have to create the 
table directory and
+// write out data before we create this table. We should handle 
this case and allow the
+// table location to be pre-created, as long as it's same with the 
default table location.
+if (tableDefinition.storage.locationUri.isDefined) {
+  val givenTableLocation = new 
Path(tableDefinition.storage.locationUri.get).toUri.toString
+  require(defaultTableLocation.toUri.toString == 
givenTableLocation)
+  tableDefinition
+} else {
+  try {
+val fs = defaultTableLocation.getFileSystem(hadoopConfig)
+fs.mkdirs(defaultTableLocation)
+  } catch {
+case e: IOException =>
+  throw new SparkException(s"Unable to create table $table as 
failed " +
--- End diff --

Maybe we should use IOException type instead of SparkException?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15037: [SPARK-17485] Prevent failed remote reads of cached bloc...

2016-09-12 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/15037
  
Merging to master and branch-2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78469005
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 ---
@@ -253,6 +266,7 @@ class InMemoryCatalog(
   throw new SparkException(s"Unable to rename table $oldName to 
$newName as failed " +
 s"to rename its directory $oldDir", e)
   }
+  oldDesc.table = oldDesc.table.withNewStorage(locationUri = 
Some(newDir.toUri.toString))
--- End diff --

Maybe for managed table, we should never set field locationUri in.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15037: [SPARK-17485] Prevent failed remote reads of cach...

2016-09-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15037


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15027: [SPARK-17475] [STREAMING] Delete CRC files if the...

2016-09-12 Thread jodersky

Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15027#discussion_r78469460
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -146,6 +146,11 @@ class HDFSMetadataLog[T: ClassTag](sparkSession: 
SparkSession, path: String)
   // It will fail if there is an existing file (someone has 
committed the batch)
   logDebug(s"Attempting to write log #${batchIdToPath(batchId)}")
   fileManager.rename(tempPath, batchIdToPath(batchId))
+
+  // SPARK-17475: HDFSMetadataLog should not leak CRC files
+  // If the underlying filesystem didn't rename the CRC file, 
delete it.
--- End diff --

Is this specific to streaming, or would other parts of spark benefit if 
this behavior were changed in the file manager?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15027: [SPARK-17475] [STREAMING] Delete CRC files if the...

2016-09-12 Thread frreiss

Github user frreiss commented on a diff in the pull request:

https://github.com/apache/spark/pull/15027#discussion_r78469982
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -146,6 +146,11 @@ class HDFSMetadataLog[T: ClassTag](sparkSession: 
SparkSession, path: String)
   // It will fail if there is an existing file (someone has 
committed the batch)
   logDebug(s"Attempting to write log #${batchIdToPath(batchId)}")
   fileManager.rename(tempPath, batchIdToPath(batchId))
+
+  // SPARK-17475: HDFSMetadataLog should not leak CRC files
+  // If the underlying filesystem didn't rename the CRC file, 
delete it.
--- End diff --

I believe HDFSMetadataLog is only called from Structured Streaming classes 
currently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78471137
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 ---
@@ -154,13 +149,8 @@ case class CreateDataSourceTableAsSelectCommand(
   return Seq.empty[Row]
 case SaveMode.Append =>
   // Check if the specified data source match the data source of 
the existing table.
-  val dataSource = DataSource(
-sparkSession = sparkSession,
-userSpecifiedSchema = Some(query.schema.asNullable),
-partitionColumns = table.partitionColumnNames,
-bucketSpec = table.bucketSpec,
-className = provider,
-options = optionsWithPath)
+  val previousProvider =
--- End diff --

Maybe rename as existingFileFormat or existingProvider?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread frreiss

Github user frreiss commented on the issue:

https://github.com/apache/spark/pull/13513
  
You could just move the metadata deletion logic from FileStreamSinkLog into 
CompactibleFileStreamLog. Then FileStreamSource could issue DELETE log records 
for files that are older than `FileStreamSource.lastPurgeTimestamp`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78471240
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 ---
@@ -204,13 +194,21 @@ case class CreateDataSourceTableAsSelectCommand(
   case None => data
 }
 
+val tableWithPath = if (table.tableType == CatalogTableType.MANAGED) {
+  table.withNewStorage(
+locationUri = 
Some(sessionState.catalog.defaultTablePath(table.identifier)))
--- End diff --

I think we should add a method in CatalogTable to return the location Uri 
by combining database path + table name.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15030: [SPARK-17474] [SQL] fix python udf in TakeOrderedAndProj...

2016-09-12 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/15030
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78471303
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 ---
@@ -204,13 +194,21 @@ case class CreateDataSourceTableAsSelectCommand(
   case None => data
 }
 
+val tableWithPath = if (table.tableType == CatalogTableType.MANAGED) {
+  table.withNewStorage(
--- End diff --

I think we should not set location Uri for managed table. As the location 
Uri can be deduced by database path and table name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78471404
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 ---
@@ -253,6 +266,7 @@ class InMemoryCatalog(
   throw new SparkException(s"Unable to rename table $oldName to 
$newName as failed " +
 s"to rename its directory $oldDir", e)
   }
+  oldDesc.table = oldDesc.table.withNewStorage(locationUri = 
Some(newDir.toUri.toString))
--- End diff --

In Hive, users are allowed to create a Hive managed table with the 
user-specified location. In Spark SQL, we do not allow users to do it. If users 
specify the location, we always convert the type to EXTERNAL.

When we creating managed table without a location, Hive will set it for us. 
Before this PR, sometimes we check `path` and sometimes we check `locationUri`. 
It could hide bugs we do not realize, especially for 
`CreateDataSourceTableAsSelectCommand`. We assume our generated `path` is 
always identical to the `locationUri` that is populated by Hive. Thus, we 
should explicitly set `locationUri` for Hive managed table using our generated 
`path`.

Based on my understanding, here is to ensure `InMemoryCatalog` and 
`HiveExternalCatalog` behave the same. 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78471429
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -665,15 +665,7 @@ case class AlterTableSetLocationCommand(
 catalog.alterPartitions(tableName, Seq(newPart))
   case None =>
 // No partition spec is specified, so we set the location for the 
table itself
-val newTable =
-  if (DDLUtils.isDatasourceTable(table)) {
-table.withNewStorage(
-  locationUri = Some(location),
-  properties = table.storage.properties ++ Map("path" -> 
location))
-  } else {
-table.withNewStorage(locationUri = Some(location))
-  }
-catalog.alterTable(newTable)
+catalog.alterTable(table.withNewStorage(locationUri = 
Some(location)))
--- End diff --

Should we throws an exception if it is a managed table?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78472272
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -435,13 +435,13 @@ case class DataSource(
 //  1. Only one output path can be specified on the write path;
 //  2. Output path must be a legal HDFS style file system path;
 //  3. It's OK that the output path doesn't exist yet;
-val caseInsensitiveOptions = new CaseInsensitiveMap(options)
-val outputPath = {
-  val path = new Path(caseInsensitiveOptions.getOrElse("path", {
-throw new IllegalArgumentException("'path' is not specified")
-  }))
+val allPaths = paths ++ new CaseInsensitiveMap(options).get("path")
+val outputPath = if (allPaths.length == 1) {
+  val path = new Path(allPaths.head)
   val fs = 
path.getFileSystem(sparkSession.sessionState.newHadoopConf())
   path.makeQualified(fs.getUri, fs.getWorkingDirectory)
+} else {
+  throw new IllegalArgumentException("Only one path can be 
specified on the write path")
--- End diff --

Maybe the error message should be more explicit? Like listing what is 
current paths, and what is the path in option? And how to fix it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78472224
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -435,13 +435,13 @@ case class DataSource(
 //  1. Only one output path can be specified on the write path;
 //  2. Output path must be a legal HDFS style file system path;
 //  3. It's OK that the output path doesn't exist yet;
-val caseInsensitiveOptions = new CaseInsensitiveMap(options)
-val outputPath = {
-  val path = new Path(caseInsensitiveOptions.getOrElse("path", {
-throw new IllegalArgumentException("'path' is not specified")
-  }))
+val allPaths = paths ++ new CaseInsensitiveMap(options).get("path")
+val outputPath = if (allPaths.length == 1) {
+  val path = new Path(allPaths.head)
   val fs = 
path.getFileSystem(sparkSession.sessionState.newHadoopConf())
   path.makeQualified(fs.getUri, fs.getWorkingDirectory)
+} else {
+  throw new IllegalArgumentException("Only one path can be 
specified on the write path")
--- End diff --

Is IllegalArgumentException a proper message type? It is a RuntimeException 
used when calling a method with reflection but passing different argument type.

```
/**
 * Thrown to indicate that a method has been passed an illegal or
 * inappropriate argument.
 *
 * @author  unascribed
 * @since   JDK1.0
 */
``` 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78472391
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -665,15 +665,7 @@ case class AlterTableSetLocationCommand(
 catalog.alterPartitions(tableName, Seq(newPart))
   case None =>
 // No partition spec is specified, so we set the location for the 
table itself
-val newTable =
-  if (DDLUtils.isDatasourceTable(table)) {
-table.withNewStorage(
-  locationUri = Some(location),
-  properties = table.storage.properties ++ Map("path" -> 
location))
-  } else {
-table.withNewStorage(locationUri = Some(location))
-  }
-catalog.alterTable(newTable)
+catalog.alterTable(table.withNewStorage(locationUri = 
Some(location)))
--- End diff --

It is legal to change the location for Hive managed tables. Why issuing an 
exception here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78472441
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala ---
@@ -262,11 +262,13 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
   throw new AnalysisException("Cannot create hive serde table with 
createExternalTable API.")
 }
 
+val location = new CaseInsensitiveMap(options).get("path")
+val storageProps = options.filterNot { case (k, _) => k.toLowerCase == 
"path" }
--- End diff --

I saw this two line duplicated in at least three places. Maybe we should 
create a util to extract the path from options?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15053
  
Oh, it will run doc tests as far as I know, 
http://www.sphinx-doc.org/en/stable/ext/doctest.html

Maybe I will try to run it locally to check it by myself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78473168
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -410,15 +417,22 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 
 if (DDLUtils.isDatasourceTable(withStatsProps)) {
   val oldDef = client.getTable(db, withStatsProps.identifier.table)
-  // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from 
the old table definition,
-  // to retain the spark specific format if it is. Also add old data 
source properties to table
-  // properties, to retain the data source table format.
-  val oldDataSourceProps = 
oldDef.properties.filter(_._1.startsWith(DATASOURCE_PREFIX))
+  // get the data source properties from old table definition, and add 
the new location entry.
+  val dataSourceProps = 
oldDef.properties.filter(_._1.startsWith(DATASOURCE_PREFIX)) ++
+tableDefinition.storage.locationUri.map { location =>
+  DATASOURCE_LOCATION -> location
+}
   val newDef = withStatsProps.copy(
+// TODO: we may break the hive-compatibility format for location 
URI here, we should follow
+// `createTable` and try to alter the table with `locationUri` 
set, if it's failed, then set
--- End diff --

This comment is not very clear to me. What you mean by saying "follow 
`createTable` and try to alter the table with `locationUri` set"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15063
  
**[Test build #65281 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65281/consoleFull)**
 for PR 15063 at commit 
[`0da2f9b`](https://github.com/apache/spark/commit/0da2f9b19769c6ff1a93a35ce3bc16b28676c930).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15063
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15063
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65281/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15030: [SPARK-17474] [SQL] fix python udf in TakeOrderedAndProj...

2016-09-12 Thread davies

Github user davies commented on the issue:

https://github.com/apache/spark/pull/15030
  
Merging into 2.0 and master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/15024#discussion_r78474538
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 ---
@@ -253,6 +266,7 @@ class InMemoryCatalog(
   throw new SparkException(s"Unable to rename table $oldName to 
$newName as failed " +
 s"to rename its directory $oldDir", e)
   }
+  oldDesc.table = oldDesc.table.withNewStorage(locationUri = 
Some(newDir.toUri.toString))
--- End diff --

We can hide the all hive specific details in HiveExternalCatalog? I see 
there is many code place where we try to add a locationUri for managed table. 
It doesn't seems necessary if the locationUri can be deduced from table name. 

We can always do some conversion in HiveExternalCatalog to change 
CatalogTable to hive understandable catalog.
 
```
private def saveTableIntoHive(tableDefinition: CatalogTable, 
ignoreIfExists: Boolean): Unit = {
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

301 - 400 of 559 matches

Mail list logo