from:"jaceklaskowski"

[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning

2016-12-16 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16309
  
Had to introduce changes to the tests given "Expired deprecations" in 
[ScalaTest 3.0.0](http://www.scalatest.org/release_notes/3.0.0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning

2016-12-16 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16309
  
Right. Testing ScalaTest 3.0.1 with the following sbt command:

```
sbt -Phadoop-2.3 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive 
test:package streaming-kafka-0-8-assembly/assembly 
streaming-flume-assembly/assembly streaming-kinesis-asl-assembly/assembly
```

Is this correct to be as close to Jenkins as possible?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning

2016-12-16 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/16309#discussion_r92784014
  
--- Diff: pom.xml ---
@@ -714,7 +714,7 @@
   
 org.scalacheck
 scalacheck_${scala.binary.version}
-1.12.5 
+1.13.4
--- End diff --

Yes, I did use maven for the entire local built and sbt for this particular 
`DAGSchedulerSuite` (that it all started from actually).

I've got little experience with Spark test suite and hence I'm relying on 
Jenkins to do the heavy-lifting and see how things may have changed since the 
comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning

2016-12-16 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/16309

[SPARK-18896][TESTS] Suppress ScalaCheck warning

## What changes were proposed in this pull request?

Fixes ScalaCheck warning by upgrading to the latest 1.13.4 version.

```
Warning: Unknown ScalaCheck args provided: -oDF
```

The reason is due to a bug in ScalaCheck as reported in 
https://github.com/rickynils/scalacheck/issues/212 and fixed in 
https://github.com/rickynils/scalacheck/commit/df435a5 that is available in 
ScalaCheck 1.13.4.

Spark uses ScalaCheck 1.12.5 which is behind the latest 1.12.6 released on 
Nov 1 (not to mention 1.13.4).

## How was this patch tested?

Local build. Waiting for Jenkins.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark SPARK-18896

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16309.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16309


commit 9a18c106f342d39216920b495f94e9855e44fda7
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-12-16T09:05:29Z

[SPARK-18896] Suppress ScalaCheck warning




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16250: [CORE][MINOR] Stylistic changes in DAGScheduler (to ease...

2016-12-13 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16250
  
Just to have the list of the reasons not to accept the changes:

1. @rxin "just fyi this is going to be slower than the original code."
2. @shivaram "Its hard to profile these things closely enough to figure out 
how much overhead it adds"
3. @srowen "There is a tiny risk of breaking something or regressing 
performance."
4. @shivaram "I don't think the improvement in readability is worth the 
risk in terms of performance regression"

I however think the points above are the exact reasons why you could accept 
this and the following changes to DAGScheduler. We should *not* be afraid of 
making changes to the core. We've got no evidence the points above hold and 
that's what worries me the most.

Thanks for your help anyway! It was worth to have sent the PR to learn from 
you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16250: [CORE][MINOR] Stylistic changes in DAGScheduler (...

2016-12-13 Thread jaceklaskowski

Github user jaceklaskowski closed the pull request at:

https://github.com/apache/spark/pull/16250


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16250: [CORE][MINOR] Stylistic changes in DAGScheduler (to ease...

2016-12-11 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16250
  
Thanks @srowen for the review! I do understand your point and remember you 
and @rxin have always been telling me that I should not touch code unless 
there's a need for a change.

But the more I'm with the code the more I think there'd be more 
contributions if the code were even slightly more readable. I'm still having 
troubles getting the gist of it (but am closer).

Spark in general is very tricky to get the hang of and when the code is 
unnecessarily complex (like doing traversing, filtering and branching in a 
convoluted way) the more functional bits could certainly help the code, me and 
the community. That's my hope.

That's also why I'm sending very small changes to get myself going with 
more ease with the code (and get more comfortable with what's acceptable). All 
in all, I don't think I'm ready for heavier contributions yet, but I do think 
I'm closer and will soon be. Your help and patience have helped a lot. Thanks 
(and don't worry if you have to reject my changes).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16250: [CORE][MINOR] Stylistic changes in DAGScheduler (...

2016-12-11 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/16250#discussion_r91860470
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -333,16 +333,16 @@ class DAGScheduler(
   // (so we don't unnecessarily re-compute that data).
   val serLocs = 
mapOutputTracker.getSerializedMapOutputStatuses(shuffleDep.shuffleId)
   val locs = MapOutputTracker.deserializeMapStatuses(serLocs)
-  (0 until locs.length).foreach { i =>
-if (locs(i) ne null) {
-  // locs(i) will be null if missing
-  stage.addOutputLoc(i, locs(i))
+  locs.zipWithIndex
+.filter { case (status, _) => status != null } // null if missing
+.foreach { case (ms, idx) =>
+  stage.addOutputLoc(idx, ms)
 }
-  }
 } else {
   // Kind of ugly: need to register RDDs with the cache and map output 
tracker here
   // since we can't do it in the RDD constructor because # of 
partitions is unknown
-  logInfo("Registering RDD " + rdd.id + " (" + rdd.getCreationSite + 
")")
+  logInfo(s"Registering RDD ${rdd.id} (partitions: 
${rdd.partitions.length}, " +
--- End diff --

Yes, but it was because I think it's important at this "stage". Do you 
think I should remove the partition-related addition?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16250: [CORE][MINOR] Stylistic changes in DAGScheduler (...

2016-12-11 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/16250

[CORE][MINOR] Stylistic changes in DAGScheduler (to ease comprehensioâ¦

## What changes were proposed in this pull request?

Stylistic changes in `DAGScheduler` to ease comprehension thereof.

## How was this patch tested?

Local build. Awaiting Jenkins to run the entire test suite.

â¦n thereof)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark dagscheduler-minors

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16250.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16250


commit bc39671cf6868f033b69179cbdc6e6c373904149
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-12-11T15:00:52Z

[CORE][MINOR] Stylistic changes in DAGScheduler (to ease comprehension 
thereof)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16145: [MINOR][CORE][SQL] Remove explicit RDD and Partition ove...

2016-12-11 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16145
  
@rxin @srowen Please re-review and act accordingly. Thanks a lot for your 
help so far!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16145: [MINOR][CORE][SQL] Remove explicit RDD and Partit...

2016-12-05 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/16145

[MINOR][CORE][SQL] Remove explicit RDD and Partition overrides

## What changes were proposed in this pull request?

I **believe** that I _only_ removed duplicated code (that adds nothing but 
noise). I'm gonna remove the comment after Jenkins has built the changes with 
no issues and Spark devs has agreed to include the changes.

Remove explicit `RDD` and `Partition` overrides (that turn out code 
duplication)

## How was this patch tested?

Local build. Awaiting Jenkins.

â¦cation)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark rdd-overrides-removed

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16145.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16145


commit a33b6ecc20f5255a199ec0ab27ebd55868fcde96
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-12-05T09:58:07Z

[MINOR][CORE][SQL] Remove explicit RDD and Partition overrides (duplication)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16144: [MINOR][CORE][SQL][DOCS] Typo fixes

2016-12-05 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/16144

[MINOR][CORE][SQL][DOCS] Typo fixes

## What changes were proposed in this pull request?

Typo fixes

## How was this patch tested?

Local build. Awaiting the official build.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark typo-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16144.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16144


commit 49418e425ce6338d23e379b879ce4daf991b38dd
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-12-05T09:46:20Z

[MINOR][CORE][SQL][DOCS] Typo fixes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15758: [MINOR] Remove calculation of bucket spec when not expec...

2016-11-03 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/15758
  
Thanks @rxin for looking into it. While reviewing that code I noticed the 
call and thought I'd push it for review here since...`getBucketSpec` is 
superfluous given `assertNotBucketed("save")` at the very beginning. Correct me 
if I'm wrong (that will further help me understand the code).

I don't want to push the change for this one single line and wouldn't mind 
if you could merge it with some other more important change(s). Do whatever is 
best for the code. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15758: [MINOR] Remove calculation of bucket spec when no...

2016-11-03 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/15758

[MINOR] Remove calculation of bucket spec when not expected

## What changes were proposed in this pull request?

Remove calculation of bucket spec when asserted not to have one earlier.

## How was this patch tested?

Local build.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark remove-getBucketSpec

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15758.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15758


commit 2162bfbf28a844273fa29de95580eec6b709be8b
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-11-03T20:21:30Z

[MINOR] Remove calculation of bucket spec when not expected




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15688: [SPARK-18173][SQL] data source tables should supp...

2016-10-30 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/15688#discussion_r85665836
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1636,15 +1635,23 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
 }
 
 withTable("rectangles", "rectangles2") {
--- End diff --

Mind changing `rectangles2` to `rectPartitioned` or similar? Think the test 
would get simpler to read (later).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15603: [WEBUI][MINOR] Remove unused appUIAddress in SparkUI + c...

2016-10-30 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/15603
  
Anything else you'd change/remove/add, Mr @srowen?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15603: [WEBUI][MINOR] Remove unused appUIAddress in SparkUI + c...

2016-10-29 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/15603
  
The test failure doesn't seem to be due to my change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15603: [WEBUI][MINOR] Return types in methods + cleanup

2016-10-29 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/15603#discussion_r85638393
  
--- Diff: core/src/test/scala/org/apache/spark/ui/UISuite.scala ---
@@ -179,16 +179,15 @@ class UISuite extends SparkFunSuite {
   test("verify appUIAddress contains the scheme") {
--- End diff --

What do you think about renaming `spark.driver.appUIAddress` to 
`spark.driver.webUrl`? Not necessarily important, but merely for the sake of 
consistency...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15603: [WEBUI][MINOR] Return types in methods + cleanup

2016-10-29 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/15603#discussion_r85638347
  
--- Diff: core/src/test/scala/org/apache/spark/ui/UISuite.scala ---
@@ -179,16 +179,15 @@ class UISuite extends SparkFunSuite {
   test("verify appUIAddress contains the scheme") {
--- End diff --

Thanks Sean! How do you manage to find so much time to help people like 
me?!?! You're certainly not a robot -- I do know it since I met you few times 
and can confirm ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15603: [WEBUI][MINOR] Return types in methods + cleanup

2016-10-28 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/15603#discussion_r85516045
  
--- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
@@ -135,7 +133,7 @@ private[spark] class SparkUI private (
 private[spark] abstract class SparkUITab(parent: SparkUI, prefix: String)
   extends WebUITab(parent, prefix) {
 
-  def appName: String = parent.getAppName
+  def appName: String = parent.appName
--- End diff --

I might have missed why we could remove it. Mind refreshing my memory 
@srowen (sorry getting too old :))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15603: [WEBUI][MINOR] Return types in methods + cleanup

2016-10-24 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/15603#discussion_r84653954
  
--- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
@@ -135,7 +133,7 @@ private[spark] class SparkUI private (
 private[spark] abstract class SparkUITab(parent: SparkUI, prefix: String)
   extends WebUITab(parent, prefix) {
 
-  def appName: String = parent.getAppName
+  def appName: String = parent.appName
--- End diff --

Yes! That's how I imagined the conversation would go. Thanks Sean! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15603: [WEBUI][MINOR] Return types in methods + cleanup

2016-10-24 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/15603#discussion_r84653856
  
--- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
@@ -93,15 +91,15 @@ private[spark] class SparkUI private (
   /** Stop the server behind this web interface. Only valid after bind(). 
*/
   override def stop() {
 super.stop()
-logInfo("Stopped Spark web UI at %s".format(appUIAddress))
+logInfo(s"Stopped Spark web UI at $appUIAddress")
   }
 
   /**
* Return the application UI host:port. This does not include the scheme 
(http://).
*/
   private[spark] def appUIHostPort = publicHostName + ":" + boundPort
 
-  private[spark] def appUIAddress = s"http://$appUIHostPort;
+  private[spark] def appUIAddress = webUrl
--- End diff --

Thanks Sean for confirming my thoughts! That's exactly the feedback I 
expected to get from you (or other Spark committers).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15603: [WEBUI][MINOR] Return types in methods + cleanup

2016-10-24 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/15603
  
The build has just proved that my thinking was correct. I think I'll 
propose few other (more agressive) changes to clean the code up a little bit 
more. I'd appreciate any comments (or acceptance).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15603: [WEBUI][MINOR] Return types in methods + cleanup

2016-10-23 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/15603

[WEBUI][MINOR] Return types in methods + cleanup

## What changes were proposed in this pull request?

The main purpose of the change is to discuss the purpose of `SparkUITab` 
class and `appUIAddress` attribute since they seem to introduce nothing new. 
Once agreed they're not needed I'm going to remove the comment.

Code cleanup

## How was this patch tested?

Local build


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark sparkui-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15603.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15603


commit 6fcd247f546c2cfe15f6909bf8a9d75dc822ef15
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-10-23T19:45:21Z

[WEBUI][MINOR] Return types in methods + cleanup




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15353: [SPARK-17724][WebUI][Streaming] Unevaluated new lines in...

2016-10-04 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/15353
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...

2016-09-28 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/15257#discussion_r80858277
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 ---
@@ -52,13 +53,51 @@ object Literal {
 case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), 
TimestampType)
 case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType)
 case a: Array[Byte] => Literal(a, BinaryType)
+case a: Array[_] =>
+  val elementType = 
componentTypeToDataType(a.getClass.getComponentType())
+  val dataType = ArrayType(elementType)
+  val convert = 
CatalystTypeConverters.createToCatalystConverter(dataType)
+  Literal(convert(a), dataType)
 case i: CalendarInterval => Literal(i, CalendarIntervalType)
 case null => Literal(null, NullType)
 case v: Literal => v
 case _ =>
   throw new RuntimeException("Unsupported literal type " + v.getClass 
+ " " + v)
   }
 
+  private def componentTypeToDataType(clz: Class[_]): DataType = clz match 
{
--- End diff --

I can't find either, but thought I'd ask.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15268: [SPARK-17598] [SQL] [Web UI] User-friendly name for Spar...

2016-09-27 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/15268
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...

2016-09-27 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/15257#discussion_r80663732
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 ---
@@ -52,13 +53,51 @@ object Literal {
 case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), 
TimestampType)
 case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType)
 case a: Array[Byte] => Literal(a, BinaryType)
+case a: Array[_] =>
+  val elementType = 
componentTypeToDataType(a.getClass.getComponentType())
+  val dataType = ArrayType(elementType)
+  val convert = 
CatalystTypeConverters.createToCatalystConverter(dataType)
+  Literal(convert(a), dataType)
 case i: CalendarInterval => Literal(i, CalendarIntervalType)
 case null => Literal(null, NullType)
 case v: Literal => v
 case _ =>
   throw new RuntimeException("Unsupported literal type " + v.getClass 
+ " " + v)
   }
 
+  private def componentTypeToDataType(clz: Class[_]): DataType = clz match 
{
--- End diff --

It looks so similar to the other cases where Spark has to map from Scala 
types to Spark SQL's, like 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L65.
 When comparing them I noticed that `CalendarInterval` is not included in your 
list. Why?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...

2016-09-24 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14808
  
@sethah I seem to have missed the other comment with the notes what has to 
be done to make the PR an example for the change. Sorry.

Since I'm very new to it and the only way to learn it better is to help 
with the PR here or the one that's coming, I'd like to be engaged one way or 
the other.

I'm gonna close this PR (to make it consistent JIRA-wise) and open another 
with the changes you've mentioned to have the example aligned with the changes 
and the requirements.

Thanks @sethah for your help! See you in the other PR...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logisti...

2016-09-24 Thread jaceklaskowski

Github user jaceklaskowski closed the pull request at:

https://github.com/apache/spark/pull/14808


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...

2016-09-20 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14808
  
@sethah Anything to improve the example. I'm open for suggestions and 
improve the example (to learn that part better). Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...

2016-09-12 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14808
  
Thanks @MLnick. What would you suggest with the PR then (after the comments 
from @sethah)? Please guide.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14958: [SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6

2016-09-06 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14958
  
Fair enough. It was just an idea that would make the upgrade even more 
super-needed. The issues that snappy has fixed could manifest in various ways 
in Spark and while doing so could uncover others. Yes, again, it could be a 
project on its own. Sorry for the noise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14958: [SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6

2016-09-05 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14958
  
Would be nice to have a test in Spark to see if the changes to snappy made 
any sense to Spark (not that I'm against -- just as a safety measure that it 
did improve things if possible).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14891: [SQL][DOC][MINOR] Add (Scala-specific) and (Java-specifi...

2016-09-02 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14891
  
@srowen, anyone you'd recommend to accept the PR (after you accepted)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...

2016-09-01 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14808
  
Any news on that @sethah? I've seen some discussions about the changes to 
unify the interfaces, but am wondering how close the other PRs are so I could 
help with that one myself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14891: [SQL][DOC][MINOR] Add (Scala-specific) and (Java-specifi...

2016-08-31 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14891
  
Frankly, I'd not have bothered with the changes if I had not seen them 
elsewhere. See 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset
 and look for `foreach` and `foreachPartition` for (Java-specific) while 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$
 and `javaSerialization` /  `kryo` for (Scala-specific). I have seen examples 
with both in one source like 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.RelationalGroupedDataset.
 That was the main reason while the other was to understand why there are two 
different variants of the same method (and then it clicked but only when I'd 
seen the other examples).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14891: [SQL][DOC][MINOR] Add (Scala-specific) and (Java-...

2016-08-31 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/14891

[SQL][DOC][MINOR] Add (Scala-specific) and (Java-specific)

## What changes were proposed in this pull request?

Adds (Scala-specific) and (Java-specific) to Scaladoc.

## How was this patch tested?

local build



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark scala-specifics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14891.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14891


commit a927c83888e93742ab41db435df8db8f0065037e
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-08-31T07:26:15Z

[SQL][DOC][MINOR] Add (Scala-specific) and (Java-specific)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...

2016-08-26 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14808
  
I don't mind adapting the example to the upcoming changes and I can work on 
it. But..

SPARK-17163 has no Fix Version/s field assigned so I'm reading it that it's 
not clear whether the change gets into the next release or not. Given 2.0.1 
might be out in a couple of weeks I believe accepting the example now and 
changing it later (once SPARK-17163 gets merged) is a viable strategy. What do 
you think?

/cc @srowen @wangmiao1981 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logisti...

2016-08-25 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/14808

[SPARK-17156][ML][EXAMPLE] Add multiclass logistic regression Scala Example

## What changes were proposed in this pull request?

New example for `MultinomialLogisticRegression` ML algorithm.

## How was this patch tested?

local build + run-example

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
SPARK-17156-multiclass-logreg-example

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14808.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14808


commit ba5a4e2cc14e253ea3465d887c3cf5a2a9d82a80
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-08-25T14:47:18Z

[SPARK-17156][ML][EXAMPLE] Add multiclass logistic regression Scala Example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14771: [SPARK-17199] Use CatalystConf.resolver for case-...

2016-08-23 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/14771

[SPARK-17199] Use CatalystConf.resolver for case-sensitivity comparison

## What changes were proposed in this pull request?

Use `CatalystConf.resolver` consistently for case-sensitivity comparison 
(removed dups).

## How was this patch tested?

Local build. Waiting for Jenkins to ensure clean build and test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
17199-catalystconf-resolver

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14771.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14771


commit 7d86c578848cc4597924a14956ef5bae0b81fef2
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-08-23T08:15:10Z

[SPARK-17199] Use CatalystConf.resolver for case-sensitivity comparison




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14722: [SPARK-13286] [SQL] add the next expression of SQLExcept...

2016-08-21 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14722
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14720: SPARK-12868: Allow Add jar to add jars from hdfs/...

2016-08-21 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14720#discussion_r75601795
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 ---
@@ -865,6 +865,16 @@ class HiveQuerySuite extends HiveComparisonTest with 
BeforeAndAfter {
 sql("DROP TABLE alter1")
   }
 
+  test("SPARK-12868 ADD JAR FROM HDFS") {
+val testJar = "hdfs://nn:8020/foo.jar"
+// This should fail with unknown host, as its just testing the URL 
parsing
+// before SPARK-12868 it was failing with Malformed URI
+val e = intercept[RuntimeException] {
+  sql(s"ADD JAR $testJar")
+}
+assert(e.getMessage.contains("java.net.UnknownHostException: nn1"))
--- End diff --

You sure it's `nn1` (with `1`)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14720: SPARK-12868: Allow Add jar to add jars from hdfs/...

2016-08-21 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14720#discussion_r75601779
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -87,6 +88,9 @@ private[hive] class HiveClientImpl(
   // Circular buffer to hold what hive prints to STDOUT and ERR.  Only 
printed when failures occur.
   private val outputBuffer = new CircularBuffer()
 
+  // An object lock to ensure URL factory is registered exactly once.
+  object URLFactoryRegistrationLock{}
--- End diff --

Why do you need `{}`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14637: [WIP] [SPARK-16967] move mesos to module

2016-08-20 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14637#discussion_r75574292
  
--- Diff: project/SparkBuild.scala ---
@@ -56,9 +56,9 @@ object BuildCommons {
 "tags", "sketch"
   ).map(ProjectRef(buildLocation, _)) ++ sqlProjects ++ streamingProjects
 
-  val optionallyEnabledProjects@Seq(yarn, java8Tests, sparkGangliaLgpl,
+  val optionallyEnabledProjects@Seq(mesos, yarn, java8Tests, 
sparkGangliaLgpl,
--- End diff --

This is a pattern matching on assignment feature in Scala and the line 
creates one `optionallyEnabledProjects` value for the entire `Seq` of projects 
with respective projects being their own values `mesos` etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14680: [SPARK-17101][SQL] Provide consistent format identifiers...

2016-08-20 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14680
  
Thanks @HyukjinKwon You're helping me a lot! I'll work on the unit test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14680: [SPARK-17101][SQL] Provide consistent format identifiers...

2016-08-19 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14680
  
How about now @HyukjinKwon ? The more I look at it the more I think it 
should calculated automatically out of the class name when constructor's 
called. It's of little to no value to a FileFormat developer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14680: [SPARK-17101][SQL] Provide consistent format identifiers...

2016-08-18 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14680
  
@rxin @HyukjinKwon Mind reviewing it again and letting me know what you 
think? I know it's minor but would greatly appreciate having it merged at your 
earliest convenience. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14680: [SPARK-17101][SQL] Provide format identifier for ...

2016-08-17 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14680#discussion_r75182485
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala
 ---
@@ -40,6 +40,8 @@ class TextFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 
   override def shortName(): String = "text"
 
+  override def toString: String = shortName.toUpperCase
--- End diff --

Let me propose a change for Parquet. Thanks for spotting it and your review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14681: [SPARK-17038][Streaming] fix metrics retrieval so...

2016-08-17 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14681#discussion_r75074428
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/ui/StreamingJobProgressListenerSuite.scala
 ---
@@ -68,6 +68,7 @@ class StreamingJobProgressListenerSuite extends 
TestSuiteBase with Matchers {
 listener.waitingBatches should be 
(List(BatchUIData(batchInfoSubmitted)))
 listener.runningBatches should be (Nil)
 listener.retainedCompletedBatches should be (Nil)
+listener.lastReceivedBatch should be 
(Some(BatchUIData(batchInfoSubmitted)))
--- End diff --

What do you think about `should contain (BatchUIData(batchInfoSubmitted))` 
instead? See 
http://www.scalatest.org/user_guide/using_matchers#workingWithContainers. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14680: [SPARK-17101][SQL] Provide format identifier for ...

2016-08-17 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14680#discussion_r75073488
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala
 ---
@@ -40,6 +40,8 @@ class TextFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 
   override def shortName(): String = "text"
 
+  override def toString: String = shortName.toUpperCase
--- End diff --

I did see it but I'd do the opposite and rather fix JSON and CSV than 
repeat what's in `shortName`. I even thought about defining the `toString` in a 
supertype, but could find nothing that would be acceptable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14680: [SPARK-17101][SQL] Provide format identifier for ...

2016-08-17 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/14680

[SPARK-17101][SQL] Provide format identifier for TextFileFormat

## What changes were proposed in this pull request?

Define the format identifier that is used in Optimized Logical Plan in 
explain for text file format (following CSV and JSON formats).

```
scala> spark.read.text("people.csv").cache.explain(extended = true)
== Parsed Logical Plan ==
Relation[value#0] text

== Analyzed Logical Plan ==
value: string
Relation[value#0] text

== Optimized Logical Plan ==
InMemoryRelation [value#0], true, 1, StorageLevel(disk, memory, 
deserialized, 1 replicas)
   +- *FileScan text [value#0] Batched: false, Format: TEXT, InputPaths: 
file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct

== Physical Plan ==
InMemoryTableScan [value#0]
   +- InMemoryRelation [value#0], true, 1, StorageLevel(disk, memory, 
deserialized, 1 replicas)
 +- *FileScan text [value#0] Batched: false, Format: TEXT, 
InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct
```

## How was this patch tested?

Local build.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark SPARK-17101

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14680


commit 133e5deff497ce1be92497344bb7e0d4e7d57c21
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-08-17T06:43:34Z

[SPARK-17101][SQL] Provide format identifier for TextFileFormat




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...

2016-08-14 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14557
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14637: [WIP] [SPARK-16967] move mesos to module

2016-08-14 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14637#discussion_r74706568
  
--- Diff: 
mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterManager.scala
 ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import org.apache.spark.{SparkContext, SparkException}
+import org.apache.spark.scheduler.{ExternalClusterManager, 
SchedulerBackend, TaskScheduler, TaskSchedulerImpl}
+import 
org.apache.spark.scheduler.cluster.mesos.{MesosCoarseGrainedSchedulerBackend, 
MesosFineGrainedSchedulerBackend}
+
+/**
+ * Cluster Manager for creation of Yarn scheduler and backend
+ */
+private[spark] class MesosClusterManager extends ExternalClusterManager {
+  private val MESOS_REGEX = """mesos://(.*)""".r
+
+  override def canCreate(masterURL: String): Boolean = {
+masterURL.startsWith("mesos")
+  }
+
+  override def createTaskScheduler(sc: SparkContext, masterURL: String): 
TaskScheduler = {
+new TaskSchedulerImpl(sc)
+  }
+
+  override def createSchedulerBackend(sc: SparkContext,
+  masterURL: String,
+  scheduler: TaskScheduler): 
SchedulerBackend = {
+val mesosUrl = MESOS_REGEX.findFirstMatchIn(masterURL).get.group(1)
+val coarse = sc.conf.getBoolean("spark.mesos.coarse", defaultValue = 
true)
--- End diff --

Oh, I thought Spark supports coarse-grain scheduling only (even for Mesos 
that had fine-grained one too). Will it stay or is this a temporary thing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14637: [WIP] [SPARK-16967] move mesos to module

2016-08-14 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14637#discussion_r74706552
  
--- Diff: 
mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterManager.scala
 ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import org.apache.spark.{SparkContext, SparkException}
+import org.apache.spark.scheduler.{ExternalClusterManager, 
SchedulerBackend, TaskScheduler, TaskSchedulerImpl}
+import 
org.apache.spark.scheduler.cluster.mesos.{MesosCoarseGrainedSchedulerBackend, 
MesosFineGrainedSchedulerBackend}
+
+/**
+ * Cluster Manager for creation of Yarn scheduler and backend
+ */
+private[spark] class MesosClusterManager extends ExternalClusterManager {
+  private val MESOS_REGEX = """mesos://(.*)""".r
+
+  override def canCreate(masterURL: String): Boolean = {
+masterURL.startsWith("mesos")
+  }
+
+  override def createTaskScheduler(sc: SparkContext, masterURL: String): 
TaskScheduler = {
+new TaskSchedulerImpl(sc)
+  }
+
+  override def createSchedulerBackend(sc: SparkContext,
+  masterURL: String,
--- End diff --

I think the convention is to inline 4 spaces only not to align to the first 
parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14637: [WIP] [SPARK-16967] move mesos to module

2016-08-14 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14637#discussion_r74706517
  
--- Diff: mesos/pom.xml ---
@@ -0,0 +1,167 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  4.0.0
+  
+org.apache.spark
+spark-parent_2.11
+2.1.0-SNAPSHOT
+../pom.xml
+  
+
+  spark-mesos_2.11
+  jar
+  Spark Project Mesos
+  
+mesos
+1.0.0
+shaded-protobuf
+  
+
+  
+
+  org.apache.spark
+  spark-core_${scala.binary.version}
+  ${project.version}
+
+
+  org.apache.spark
+  spark-core_${scala.binary.version}
+  ${project.version}
+  test-jar
+  test
+
+
+  org.apache.mesos
+  mesos
+  ${mesos.version}
+  ${mesos.classifier}
+  
+
+  com.google.protobuf
+  protobuf-java
+
+  
+
+
+
+

[GitHub] spark pull request #14637: [WIP] [SPARK-16967] move mesos to module

2016-08-14 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14637#discussion_r74706522
  
--- Diff: mesos/pom.xml ---
@@ -0,0 +1,167 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  4.0.0
+  
+org.apache.spark
+spark-parent_2.11
+2.1.0-SNAPSHOT
+../pom.xml
+  
+
+  spark-mesos_2.11
+  jar
+  Spark Project Mesos
+  
+mesos
+1.0.0
+shaded-protobuf
+  
+
+  
+
+  org.apache.spark
+  spark-core_${scala.binary.version}
+  ${project.version}
+
+
+  org.apache.spark
+  spark-core_${scala.binary.version}
+  ${project.version}
+  test-jar
+  test
+
+
+  org.apache.mesos
+  mesos
+  ${mesos.version}
+  ${mesos.classifier}
+  
+
+  com.google.protobuf
+  protobuf-java
+
+  
+
+
+
+
+
+
+  org.mockito
+  mockito-core
+  test
+
+

[GitHub] spark pull request #14637: [WIP] [SPARK-16967] move mesos to module

2016-08-14 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14637#discussion_r74706502
  
--- Diff: mesos/pom.xml ---
@@ -0,0 +1,167 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
--- End diff --

Isn't the line too long? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14637: [WIP] [SPARK-16967] move mesos to module

2016-08-14 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14637#discussion_r74706494
  
--- Diff: mesos/pom.xml ---
@@ -0,0 +1,167 @@
+
+

[GitHub] spark pull request #14576: [SPARK-16391][SQL] ReduceAggregator and partial a...

2016-08-14 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14576#discussion_r74706380
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala 
---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.expressions
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.sql.Encoder
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+
+/**
+ * :: Experimental ::
+ * An aggregator that uses a single associative and commutative reduce 
function. This reduce
+ * function can be used to go through all input values and reduces them to 
a single value.
+ * If there is no input, a null value is returned.
+ *
+ * @since 2.1.0
+ */
+@Experimental
+abstract class ReduceAggregator[T] extends Aggregator[T, (Boolean, T), T] {
+
+  // Question 1: Should func and encoder be parameters rather than 
abstract methods?
+  //  rxin: abstract method has better java compatibility and forces 
naming the concrete impl,
+  //  whereas parameter has better type inference (infer encoders via 
context bounds).
+  // Question 2: Should finish throw an exception or return null if there 
is no input?
+  //  rxin: null might be more "SQL" like, whereas exception is more Scala 
like.
+
+  /**
+   * A associative and commutative reduce function.
+   * @since 2.1.0
+   */
+  def func(a: T, b: T): T
+
+  /**
+   * Encoder for type T.
+   * @since 2.1.0
+   */
+  def encoder: ExpressionEncoder[T]
+
+  override def zero: (Boolean, T) = (false, null.asInstanceOf[T])
+
+  override def bufferEncoder: Encoder[(Boolean, T)] =
+ExpressionEncoder.tuple(ExpressionEncoder[Boolean](), encoder)
--- End diff --

`Encoders.scalaBoolean`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14636: [SPARK-17053][SQL] Support `hive.exec.drop.ignorenonexis...

2016-08-14 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14636
  
The JIRA issue (https://issues.apache.org/jira/browse/SPARK-17053) is 
closed as Won't Fix. Should the PR be closed too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14561: [SPARK-16972][CORE] Move DriverEndpoint out of CoarseGra...

2016-08-14 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14561
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14561: [SPARK-16972][CORE] Move DriverEndpoint out of CoarseGra...

2016-08-14 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14561
  
Finally someone took the responsibility and is clearing that important and 
critical path in Spark Core. Whenever I see the code, I feel what @lshmouse 
felt -- it has to be refactored to invite further contributions. Kudos 
@lshmouse!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14557: [SPARK-16709][CORE] Kill the running task if stag...

2016-08-13 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14557#discussion_r74688576
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -798,6 +798,19 @@ private[spark] class TaskSetManager(
   }
 }
 maybeFinishTaskSet()
+
+// kill running task if stage failed
+if(reason.isInstanceOf[FetchFailed]) {
+  killTasks(runningTasksSet, taskInfos)
+}
+  }
+
+  def killTasks(tasks: HashSet[Long], taskInfo: HashMap[Long, TaskInfo]): 
Boolean = {
+tasks.foreach { task =>
+  val executorId = taskInfo(task).executorId
+  sched.sc.schedulerBackend.killTask(task, executorId, true)
--- End diff --

I think `true` would look nicer with the name of the parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14557: [SPARK-16709][CORE] Kill the running task if stag...

2016-08-13 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14557#discussion_r74688557
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -798,6 +798,19 @@ private[spark] class TaskSetManager(
   }
 }
 maybeFinishTaskSet()
+
+// kill running task if stage failed
+if(reason.isInstanceOf[FetchFailed]) {
+  killTasks(runningTasksSet, taskInfos)
+}
+  }
+
+  def killTasks(tasks: HashSet[Long], taskInfo: HashMap[Long, TaskInfo]): 
Boolean = {
+tasks.foreach { task =>
+  val executorId = taskInfo(task).executorId
+  sched.sc.schedulerBackend.killTask(task, executorId, true)
+}
+true
--- End diff --

Why are you returning `true` if you don't use it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14557: [SPARK-16709][CORE] Kill the running task if stag...

2016-08-13 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14557#discussion_r74688550
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -798,6 +798,19 @@ private[spark] class TaskSetManager(
   }
 }
 maybeFinishTaskSet()
+
+// kill running task if stage failed
+if(reason.isInstanceOf[FetchFailed]) {
--- End diff --

A space between `if` and `(`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14524: [SPARK-16832] [ML] [WIP] CrossValidator and Train...

2016-08-13 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14524#discussion_r74688310
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala 
---
@@ -67,7 +67,7 @@ private[shared] object SharedParamsCodeGen {
 isValid = "ParamValidators.inArray(Array(\"skip\", \"error\"))"),
   ParamDesc[Boolean]("standardization", "whether to standardize the 
training features" +
 " before fitting the model", Some("true")),
-  ParamDesc[Long]("seed", "random seed", 
Some("this.getClass.getName.hashCode.toLong")),
+  ParamDesc[Long]("seed", "random seed", Some("new 
java.util.Random().nextLong()")),
--- End diff --

Ah, learnt this today. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14524: [SPARK-16832] [ML] [WIP] CrossValidator and Train...

2016-08-12 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14524#discussion_r74680486
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala 
---
@@ -67,7 +67,7 @@ private[shared] object SharedParamsCodeGen {
 isValid = "ParamValidators.inArray(Array(\"skip\", \"error\"))"),
   ParamDesc[Boolean]("standardization", "whether to standardize the 
training features" +
 " before fitting the model", Some("true")),
-  ParamDesc[Long]("seed", "random seed", 
Some("this.getClass.getName.hashCode.toLong")),
+  ParamDesc[Long]("seed", "random seed", Some("new 
java.util.Random().nextLong()")),
--- End diff --

Is this a string on purpose?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14533: [SPARK-16606] [CORE] Misleading warning for SparkContext...

2016-08-08 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14533
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14450: [SPARK-16847][SQL] Prevent to potentially read co...

2016-08-04 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14450#discussion_r73511032
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
 ---
@@ -204,7 +205,8 @@ protected void initialize(String path, List 
columns) throws IOException
   }
 }
 this.sparkSchema = new 
ParquetSchemaConverter(config).convert(requestedSchema);
-this.reader = new ParquetFileReader(config, file, blocks, 
requestedSchema.getColumns());
+this.reader = new ParquetFileReader(
+config, footer.getFileMetaData(), file, blocks, 
requestedSchema.getColumns());
 for (BlockMetaData block : blocks) {
--- End diff --

Same here. BTW, code duplication?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14450: [SPARK-16847][SQL] Prevent to potentially read co...

2016-08-04 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14450#discussion_r73510962
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
 ---
@@ -140,7 +140,8 @@ public void initialize(InputSplit inputSplit, 
TaskAttemptContext taskAttemptCont
 String sparkRequestedSchemaString =
 
configuration.get(ParquetReadSupport$.MODULE$.SPARK_ROW_REQUESTED_SCHEMA());
 this.sparkSchema = 
StructType$.MODULE$.fromString(sparkRequestedSchemaString);
-this.reader = new ParquetFileReader(configuration, file, blocks, 
requestedSchema.getColumns());
+this.reader = new ParquetFileReader(
+configuration, footer.getFileMetaData(), file, blocks, 
requestedSchema.getColumns());
 for (BlockMetaData block : blocks) {
--- End diff --

While we're at it, are the spaces around `:` necessary? (it's been a while 
since I developed in Java so I might be wrong).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14451: [SPARK-16848][SQL] Make jdbc() and read.format("j...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14451#discussion_r73339896
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -228,6 +228,7 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   table: String,
   parts: Array[Partition],
   connectionProperties: Properties): DataFrame = {
+require(userSpecifiedSchema.isEmpty, "jdbc does not allow 
user-specified schemas.")
--- End diff --

Just `schema` (singular not plural)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/1#discussion_r73337483
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
 ---
@@ -218,9 +221,44 @@ class AnalysisSuite extends AnalysisTest {
 
 // CreateStruct is a special case that we should not trim Alias for it.
 plan = testRelation.select(CreateStruct(Seq(a, (a + 
1).as("a+1"))).as("col"))
-checkAnalysis(plan, plan)
+expected = testRelation.select(CreateNamedStruct(Seq(
+  Literal(a.name), a,
+  Literal("a+1"),(a + 1))).as("col"))
+checkAnalysis(plan, expected)
 plan = testRelation.select(CreateStructUnsafe(Seq(a, (a + 
1).as("a+1"))).as("col"))
-checkAnalysis(plan, plan)
+expected = testRelation.select(CreateNamedStructUnsafe(Seq(
+  Literal(a.name), a,
+  Literal("a+1"),(a + 1))).as("col"))
+checkAnalysis(plan, expected)
+  }
+
+  test("Analysis may leave unnecassary aliases") {
+val att1 = testRelation.output.head
+var plan = testRelation.select(
+  CreateStructUnsafe(Seq(att1, ((att1 as "aa") + 
1).as("a_plus_1"))).as("col"),
+  att1
+)
+val prevPlan = getAnalyzer(true).execute(plan)
+plan = prevPlan.select(CreateArray(Seq(
+  CreateStructUnsafe(Seq(att1, (att1 + 1).as("a_plus_1"))),
+  /** alias should be eliminated by [[CleanupAliases]] */
+  "col".attr as "col2"
+)) as "arr")
+plan = getAnalyzer(true).execute(plan)
+
+plan should be (a[Project])
+val Project( projectExpressions, _) = plan
+projectExpressions should have (size(1))
+val Seq(expr1) = projectExpressions
+expr1 should be (a[Alias])
+val Alias(expr2, arrAlias) = expr1
+arrAlias shouldBe "arr"
+expr2 should be (a[CreateArray])
+val CreateArray(arrElements) = expr2
+arrElements should have (size(2))
+val Seq(el1, el2) = arrElements
+el1 should not be a[Alias]
--- End diff --

Why no `(` since you've been using them in `should be`'s earlier?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/1#discussion_r73337339
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
 ---
@@ -218,9 +221,44 @@ class AnalysisSuite extends AnalysisTest {
 
 // CreateStruct is a special case that we should not trim Alias for it.
 plan = testRelation.select(CreateStruct(Seq(a, (a + 
1).as("a+1"))).as("col"))
-checkAnalysis(plan, plan)
+expected = testRelation.select(CreateNamedStruct(Seq(
+  Literal(a.name), a,
+  Literal("a+1"),(a + 1))).as("col"))
+checkAnalysis(plan, expected)
 plan = testRelation.select(CreateStructUnsafe(Seq(a, (a + 
1).as("a+1"))).as("col"))
-checkAnalysis(plan, plan)
+expected = testRelation.select(CreateNamedStructUnsafe(Seq(
+  Literal(a.name), a,
+  Literal("a+1"),(a + 1))).as("col"))
+checkAnalysis(plan, expected)
+  }
+
+  test("Analysis may leave unnecassary aliases") {
+val att1 = testRelation.output.head
+var plan = testRelation.select(
+  CreateStructUnsafe(Seq(att1, ((att1 as "aa") + 
1).as("a_plus_1"))).as("col"),
+  att1
+)
+val prevPlan = getAnalyzer(true).execute(plan)
+plan = prevPlan.select(CreateArray(Seq(
+  CreateStructUnsafe(Seq(att1, (att1 + 1).as("a_plus_1"))),
+  /** alias should be eliminated by [[CleanupAliases]] */
+  "col".attr as "col2"
+)) as "arr")
+plan = getAnalyzer(true).execute(plan)
+
+plan should be (a[Project])
+val Project( projectExpressions, _) = plan
+projectExpressions should have (size(1))
+val Seq(expr1) = projectExpressions
+expr1 should be (a[Alias])
--- End diff --

Don't test it here since it would break in the line below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/1#discussion_r73337237
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
 ---
@@ -218,9 +221,44 @@ class AnalysisSuite extends AnalysisTest {
 
 // CreateStruct is a special case that we should not trim Alias for it.
 plan = testRelation.select(CreateStruct(Seq(a, (a + 
1).as("a+1"))).as("col"))
-checkAnalysis(plan, plan)
+expected = testRelation.select(CreateNamedStruct(Seq(
+  Literal(a.name), a,
+  Literal("a+1"),(a + 1))).as("col"))
+checkAnalysis(plan, expected)
 plan = testRelation.select(CreateStructUnsafe(Seq(a, (a + 
1).as("a+1"))).as("col"))
-checkAnalysis(plan, plan)
+expected = testRelation.select(CreateNamedStructUnsafe(Seq(
+  Literal(a.name), a,
+  Literal("a+1"),(a + 1))).as("col"))
+checkAnalysis(plan, expected)
+  }
+
+  test("Analysis may leave unnecassary aliases") {
+val att1 = testRelation.output.head
+var plan = testRelation.select(
+  CreateStructUnsafe(Seq(att1, ((att1 as "aa") + 
1).as("a_plus_1"))).as("col"),
+  att1
+)
+val prevPlan = getAnalyzer(true).execute(plan)
+plan = prevPlan.select(CreateArray(Seq(
+  CreateStructUnsafe(Seq(att1, (att1 + 1).as("a_plus_1"))),
+  /** alias should be eliminated by [[CleanupAliases]] */
+  "col".attr as "col2"
+)) as "arr")
+plan = getAnalyzer(true).execute(plan)
+
+plan should be (a[Project])
+val Project( projectExpressions, _) = plan
--- End diff --

Space?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/1#discussion_r73337186
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
 ---
@@ -218,9 +221,44 @@ class AnalysisSuite extends AnalysisTest {
 
 // CreateStruct is a special case that we should not trim Alias for it.
 plan = testRelation.select(CreateStruct(Seq(a, (a + 
1).as("a+1"))).as("col"))
-checkAnalysis(plan, plan)
+expected = testRelation.select(CreateNamedStruct(Seq(
+  Literal(a.name), a,
+  Literal("a+1"),(a + 1))).as("col"))
--- End diff --

I think, in this context, ` as "col"` would be acceptable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/1#discussion_r73337136
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
 ---
@@ -218,9 +221,44 @@ class AnalysisSuite extends AnalysisTest {
 
 // CreateStruct is a special case that we should not trim Alias for it.
 plan = testRelation.select(CreateStruct(Seq(a, (a + 
1).as("a+1"))).as("col"))
-checkAnalysis(plan, plan)
+expected = testRelation.select(CreateNamedStruct(Seq(
+  Literal(a.name), a,
+  Literal("a+1"),(a + 1))).as("col"))
+checkAnalysis(plan, expected)
 plan = testRelation.select(CreateStructUnsafe(Seq(a, (a + 
1).as("a+1"))).as("col"))
-checkAnalysis(plan, plan)
+expected = testRelation.select(CreateNamedStructUnsafe(Seq(
+  Literal(a.name), a,
+  Literal("a+1"),(a + 1))).as("col"))
--- End diff --

Missing space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/1#discussion_r73337027
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2109,15 +2119,12 @@ object CleanupAliases extends Rule[LogicalPlan] {
 case a: AppendColumns => a
 
 case other =>
-  var stop = false
   other transformExpressionsDown {
-case c: CreateStruct if !stop =>
-  stop = true
-  c.copy(children = c.children.map(trimNonTopLevelAliases))
-case c: CreateStructUnsafe if !stop =>
-  stop = true
-  c.copy(children = c.children.map(trimNonTopLevelAliases))
-case Alias(child, _) if !stop => child
+case c @ CreateStruct(children) =>
+  CreateNamedStruct( mkNamedStructArgs(c.dataType, children))
+case c @ CreateStructUnsafe(children) =>
+  CreateNamedStructUnsafe( mkNamedStructArgs(c.dataType, children))
--- End diff --

And here, too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/1#discussion_r73337012
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2109,15 +2119,12 @@ object CleanupAliases extends Rule[LogicalPlan] {
 case a: AppendColumns => a
 
 case other =>
-  var stop = false
   other transformExpressionsDown {
-case c: CreateStruct if !stop =>
-  stop = true
-  c.copy(children = c.children.map(trimNonTopLevelAliases))
-case c: CreateStructUnsafe if !stop =>
-  stop = true
-  c.copy(children = c.children.map(trimNonTopLevelAliases))
-case Alias(child, _) if !stop => child
+case c @ CreateStruct(children) =>
+  CreateNamedStruct( mkNamedStructArgs(c.dataType, children))
--- End diff --

Mind the space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/1#discussion_r73337074
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
 ---
@@ -25,7 +27,8 @@ import org.apache.spark.sql.catalyst.plans.Inner
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.types._
 
-class AnalysisSuite extends AnalysisTest {
+
+class AnalysisSuite extends AnalysisTest with ShouldMatchers{
--- End diff --

Mind no space here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/1#discussion_r73336967
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2065,18 +2065,28 @@ object EliminateUnions extends Rule[LogicalPlan] {
  */
 object CleanupAliases extends Rule[LogicalPlan] {
   private def trimAliases(e: Expression): Expression = {
-var stop = false
 e.transformDown {
-  // CreateStruct is a special case, we need to retain its top level 
Aliases as they decide the
-  // name of StructField. We also need to stop transform down this 
expression, or the Aliases
-  // under CreateStruct will be mistakenly trimmed.
-  case c: CreateStruct if !stop =>
-stop = true
-c.copy(children = c.children.map(trimNonTopLevelAliases))
-  case c: CreateStructUnsafe if !stop =>
-stop = true
-c.copy(children = c.children.map(trimNonTopLevelAliases))
-  case Alias(child, _) if !stop => child
+  /**
+*  [[CreateStruct]] is a special case as it uses its top level 
Aliases to decide the
+*  name of StructField.
+*  we tackle this by replacing [[CreateStruct]] with 
[[CreateNamedStruct]]
+*  which encodes the field names as child literals.
+*/
+  case c @ CreateStruct(children) =>
+CreateNamedStruct( mkNamedStructArgs(c.dataType, children))
+  case c @ CreateStructUnsafe(children) =>
+CreateNamedStructUnsafe( mkNamedStructArgs(c.dataType, children))
+  case Alias(child, _) => child
+}
+  }
+
+  private def mkNamedStructArgs( structType : StructType, 
attributeExpressions: Seq[Expression]) = {
+for {
+  (name, expression) <- structType.fieldNames.zip(attributeExpressions)
+  nameLiteral = Literal(name)
+  newChild <- Seq(nameLiteral, expression)
+} yield{
+  newChild
--- End diff --

Do we really need `{` around `newChild`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r73334242
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -739,6 +931,15 @@ case class Sample(
 case class Distinct(child: LogicalPlan) extends UnaryNode {
   override def maxRows: Option[Long] = child.maxRows
   override def output: Seq[Attribute] = child.output
+
+  override def sql: String = child match {
+case Union(children) =>
+  val childrenSql = children.map(c => s"(${c.sql})")
+  childrenSql.mkString(" UNION DISTINCT ")
+
+case _: Project =>
--- End diff --

Replace `_` to `p` and use it instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r73334167
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -731,6 +909,20 @@ case class Sample(
   }
 
   override protected def otherCopyArgs: Seq[AnyRef] = isTableSample :: Nil
+
+  override def sql: String = child match {
+case SubqueryAlias(alias, _: NonSQLPlan) =>
+  val repeatable = if (withReplacement) s" REPEATABLE ($seed)" else ""
+  s"$alias TABLESAMPLE(${upperBound * 100} PERCENT)$repeatable"
+
+case SubqueryAlias(alias, grandChild) =>
+  val repeatable = if (withReplacement) s" REPEATABLE ($seed)" else ""
+  s"${grandChild.sql} TABLESAMPLE(${upperBound * 100} 
PERCENT)$repeatable $alias"
+
+case _ =>
+  val repeatable = if (withReplacement) s" REPEATABLE ($seed)" else ""
--- End diff --

`repeatable` repeated three times (and don't think it's on purpose despite 
the name :))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r73334011
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -692,11 +864,17 @@ case class LocalLimit(limitExpr: Expression, child: 
LogicalPlan) extends UnaryNo
 }
 child.statistics.copy(sizeInBytes = sizeInBytes)
   }
+
+  override def sql: String = child.sql
 }
 
 case class SubqueryAlias(alias: String, child: LogicalPlan) extends 
UnaryNode {
 
   override def output: Seq[Attribute] = 
child.output.map(_.withQualifier(Some(alias)))
+
+  override def sql: String = child match {
+case _ => if (child.sql.equals(alias)) child.sql else s"(${child.sql}) 
AS $alias"
--- End diff --

Why do you pattern match here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r7845
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -495,6 +573,92 @@ case class Aggregate(
   super.statistics
 }
   }
+
+  private def sameOutput(output1: Seq[Attribute], output2: 
Seq[Attribute]): Boolean =
+output1.size == output2.size &&
+  output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2))
+
+  private def isGroupingSet(e: Expand, p: Project) = {
--- End diff --

Ah, so you are using `{` to wrap boolean one-liners :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r7741
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -495,6 +573,92 @@ case class Aggregate(
   super.statistics
 }
   }
+
+  private def sameOutput(output1: Seq[Attribute], output2: 
Seq[Attribute]): Boolean =
+output1.size == output2.size &&
+  output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2))
--- End diff --

Think `forall { case (left, right) => left semanticEquals right }` (or with 
dots) could be more readable. WDYT?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r7585
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -495,6 +573,92 @@ case class Aggregate(
   super.statistics
 }
   }
+
+  private def sameOutput(output1: Seq[Attribute], output2: 
Seq[Attribute]): Boolean =
--- End diff --

Wrap it around `{` and `}`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r7461
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -167,6 +212,8 @@ case class Intersect(left: LogicalPlan, right: 
LogicalPlan) extends SetOperation
 
 Statistics(sizeInBytes = sizeInBytes, isBroadcastable = 
isBroadcastable)
   }
+
+  override def sql: String = s"(${left.sql}) INTERSECT (${right.sql})"
--- End diff --

I think `INTERSECT` et al should be a value to later limit typos (I think 
there are tests that test that the output contains `INTERSET`, right?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r7286
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -112,6 +152,11 @@ case class Filter(condition: Expression, child: 
LogicalPlan)
   .filterNot(SubqueryExpression.hasCorrelatedSubquery)
 child.constraints.union(predicates.toSet)
   }
+
+  override def sql: String = child match {
+case _: Aggregate => s"${child.sql} HAVING ${condition.sql}"
+case _ => s"${child.sql} WHERE ${condition.sql}"
--- End diff --

The only difference is `HAVING` vs `WHERE`, right?

Mind extracting the small difference out and doing the following instead:

```
val havingOrWhere = ???
s"${child.sql} $havingOrWhere ${condition.sql}"
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r73332988
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -99,6 +133,12 @@ case class Generate(
 
 if (join) child.output ++ qualified else qualified
   }
+
+  override def sql: String = {
+val columnAliases = generatorOutput.map(_.sql).mkString(", ")
+s"${child.sql} LATERAL VIEW ${if (outer) "OUTER" else ""} " +
--- End diff --

Could you move `${if...` outside the interpolated string?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r73332573
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -53,6 +53,40 @@ case class Project(projectList: Seq[NamedExpression], 
child: LogicalPlan) extend
 
   override def validConstraints: Set[Expression] =
 child.constraints.union(getAliasedConstraints(projectList))
+
+  override def sql: String = {
+if (projectList.exists(expr => expr.find(e => 
e.isInstanceOf[NonSQLExpression]).isDefined)) {
+  throw new UnsupportedOperationException("NonSQLExpression")
--- End diff --

Would `assert` be applicable here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r73332522
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -53,6 +53,40 @@ case class Project(projectList: Seq[NamedExpression], 
child: LogicalPlan) extend
 
   override def validConstraints: Set[Expression] =
 child.constraints.union(getAliasedConstraints(projectList))
+
+  override def sql: String = {
+if (projectList.exists(expr => expr.find(e => 
e.isInstanceOf[NonSQLExpression]).isDefined)) {
--- End diff --

This `if` is overly complex and forces a reader to read the expression 
inside-out (not left to right).

I can't propose anything better than beg for a boolean val with the entire 
boolean condition on a separate line.

WDYT?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14410: [SPARK-16803] [SQL] SaveAsTable does not work whe...

2016-08-01 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14410#discussion_r73051607
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -1457,6 +1457,59 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("saveAsTable(CTAS) when the source DataFrame is built on a Hive 
table") {
+val tableName = "tab1"
+withTable(tableName) {
+  sql(s"CREATE TABLE $tableName stored as SEQUENCEFILE as select 1 as 
key, 'abc' as value")
+  val df = sql(s"select key, value as value from $tableName")
+
+  val message = intercept[AnalysisException] {
+df.write.mode("append").saveAsTable(tableName)
+  }.getMessage
+  assert(message.contains(
+"saveAsTable() for Hive tables is not supported yet. " +
+  "Please use insertInto() as an alternative"))
+}
+  }
+
+  test("saveAsTable(CTAS) and insertInto when the source DataFrame is 
built on Data Source") {
+val tableName = "tab1"
+withTable(tableName) {
+  val schema = StructType(
+StructField("key", IntegerType, nullable = false) ::
+  StructField("value", IntegerType, nullable = true) :: Nil)
+  val row = Row(3, 4)
+  val df = spark.createDataFrame(sparkContext.parallelize(row :: Nil), 
schema)
+
+  df.write.format("json").mode("overwrite").saveAsTable(tableName)
+  df.write.format("json").mode("append").saveAsTable(tableName)
+  checkAnswer(
+sql(s"SELECT key, value FROM $tableName"),
+Row(3, 4) :: Row(3, 4) :: Nil
+  )
+
+  (1 to 2).map { i => (i, i) }.toDF("key", 
"value").write.insertInto(tableName)
+  checkAnswer(
+sql(s"SELECT key, value FROM $tableName"),
+Row(1, 1) :: Row(2, 2) :: Row(3, 4) :: Row(3, 4) :: Nil
+  )
+}
+  }
+
+  test("insertInto when the source DataFrame is built on a Hive table") {
+val tableName = "tab1"
+withTable(tableName) {
+  sql(s"CREATE TABLE $tableName stored as SEQUENCEFILE as select 1 as 
key, 'abc' as value")
+  val df = sql(s"select key, value as value from $tableName")
--- End diff --

Is there a reason why you're using lowercase variant of the query in `sql` 
while `CREATE TABLE` above is uppercase? It seems inconsistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14410: [SPARK-16803] [SQL] SaveAsTable does not work whe...

2016-08-01 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14410#discussion_r73035297
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -1457,6 +1457,59 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("saveAsTable(CTAS) when the source DataFrame is built on a Hive 
table") {
+val tableName = "tab1"
+withTable(tableName) {
+  sql(s"CREATE TABLE $tableName stored as SEQUENCEFILE as select 1 as 
key, 'abc' as value")
+  val df = sql(s"select key, value as value from $tableName")
+
+  val message = intercept[AnalysisException] {
+df.write.mode("append").saveAsTable(tableName)
--- End diff --

and here too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14410: [SPARK-16803] [SQL] SaveAsTable does not work whe...

2016-08-01 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14410#discussion_r73035235
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 ---
@@ -449,6 +449,22 @@ class DataFrameReaderWriterSuite extends QueryTest 
with SharedSQLContext with Be
 }
   }
 
+  test("saveAsTable(CTAS) when the source DataFrame is built on a 
SimpleCatalog table") {
+import testImplicits._
+
+val tableName = "tab1"
+withTable(tableName) {
+  sql(s"CREATE TABLE $tableName(key int, value int) USING parquet")
+  (1 to 2).map { i => (i, i) }.toDF("key", 
"value").write.insertInto(tableName)
+  val df = sql(s"select key, value from $tableName")
+  df.write.mode("append").saveAsTable(tableName)
--- End diff --

What do you think about using `SaveMode.Append` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14410: [SPARK-16803] [SQL] SaveAsTable does not work whe...

2016-08-01 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14410#discussion_r73034863
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 ---
@@ -236,6 +236,11 @@ case class CreateDataSourceTableAsSelectCommand(
   existingSchema = Some(l.schema)
 case s: SimpleCatalogRelation if 
DDLUtils.isDatasourceTable(s.metadata) =>
   existingSchema = 
Some(DDLUtils.getSchemaFromTableProperties(s.metadata))
+// When the source table is a Hive table, the `saveAsTable` 
API of DataFrameWriter
+// does not work. Instead, use the `insertInto` API.
+case o if o.getClass.getName == 
"org.apache.spark.sql.hive.MetastoreRelation" =>
+  throw new AnalysisException("saveAsTable() for Hive tables 
is not supported yet. " +
--- End diff --

Remove "yet" or point to a JIRA issue where it's going to be introduced.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14410: [SPARK-16803] [SQL] SaveAsTable does not work whe...

2016-08-01 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14410#discussion_r73034796
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 ---
@@ -236,6 +236,11 @@ case class CreateDataSourceTableAsSelectCommand(
   existingSchema = Some(l.schema)
 case s: SimpleCatalogRelation if 
DDLUtils.isDatasourceTable(s.metadata) =>
   existingSchema = 
Some(DDLUtils.getSchemaFromTableProperties(s.metadata))
+// When the source table is a Hive table, the `saveAsTable` 
API of DataFrameWriter
+// does not work. Instead, use the `insertInto` API.
+case o if o.getClass.getName == 
"org.apache.spark.sql.hive.MetastoreRelation" =>
--- End diff --

`case o: MetastoreRelation` doesn't work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14414: [SPARK-16809] enable history server links in disp...

2016-08-01 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14414#discussion_r73033311
  
--- Diff: docs/running-on-mesos.md ---
@@ -468,6 +468,17 @@ See the [configuration page](configuration.html) for 
information on Spark config
 If unset it will point to Spark's internal web UI.
   
 
+
+  spark.mesos.dispatcher.historyServer.url
+  (none)
+  
+Set the URL of the http://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact;>history
+server>.  The dispatcher will then link each driver to its entry
--- End diff --

Looks like the ending tag is mistyped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14414: [SPARK-16809] enable history server links in disp...

2016-08-01 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14414#discussion_r73033008
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -152,8 +152,13 @@ private[spark] class 
MesosCoarseGrainedSchedulerBackend(
   sc.sparkUser,
   sc.appName,
   sc.conf,
-  
sc.conf.getOption("spark.mesos.driver.webui.url").orElse(sc.ui.map(_.appUIAddress))
+  
sc.conf.getOption("spark.mesos.driver.webui.url").orElse(sc.ui.map(_.appUIAddress)),
+  Option.empty,
+  Option.empty,
+  sc.conf.getOption("spark.mesos.driver.frameworkId")
--- End diff --

Mind introducing a constant for `spark.mesos.driver.frameworkId`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14414: [SPARK-16809] enable history server links in disp...

2016-08-01 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14414#discussion_r73032952
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -152,8 +152,13 @@ private[spark] class 
MesosCoarseGrainedSchedulerBackend(
   sc.sparkUser,
   sc.appName,
   sc.conf,
-  
sc.conf.getOption("spark.mesos.driver.webui.url").orElse(sc.ui.map(_.appUIAddress))
+  
sc.conf.getOption("spark.mesos.driver.webui.url").orElse(sc.ui.map(_.appUIAddress)),
+  Option.empty,
--- End diff --

`None`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

201 - 300 of 590 matches

Mail list logo