[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16309 Had to introduce changes to the tests given "Expired deprecations" in [ScalaTest 3.0.0](http://www.scalatest.org/release_notes/3.0.0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16309 Right. Testing ScalaTest 3.0.1 with the following sbt command: ``` sbt -Phadoop-2.3 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive test:package streaming-kafka-0-8-assembly/assembly streaming-flume-assembly/assembly streaming-kinesis-asl-assembly/assembly ``` Is this correct to be as close to Jenkins as possible? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/16309#discussion_r92784014 --- Diff: pom.xml --- @@ -714,7 +714,7 @@ org.scalacheck scalacheck_${scala.binary.version} -1.12.5 +1.13.4 --- End diff -- Yes, I did use maven for the entire local built and sbt for this particular `DAGSchedulerSuite` (that it all started from actually). I've got little experience with Spark test suite and hence I'm relying on Jenkins to do the heavy-lifting and see how things may have changed since the comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/16309 [SPARK-18896][TESTS] Suppress ScalaCheck warning ## What changes were proposed in this pull request? Fixes ScalaCheck warning by upgrading to the latest 1.13.4 version. ``` Warning: Unknown ScalaCheck args provided: -oDF ``` The reason is due to a bug in ScalaCheck as reported in https://github.com/rickynils/scalacheck/issues/212 and fixed in https://github.com/rickynils/scalacheck/commit/df435a5 that is available in ScalaCheck 1.13.4. Spark uses ScalaCheck 1.12.5 which is behind the latest 1.12.6 released on Nov 1 (not to mention 1.13.4). ## How was this patch tested? Local build. Waiting for Jenkins. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark SPARK-18896 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16309.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16309 commit 9a18c106f342d39216920b495f94e9855e44fda7 Author: Jacek Laskowski <ja...@japila.pl> Date: 2016-12-16T09:05:29Z [SPARK-18896] Suppress ScalaCheck warning --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16250: [CORE][MINOR] Stylistic changes in DAGScheduler (to ease...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16250 Just to have the list of the reasons not to accept the changes: 1. @rxin "just fyi this is going to be slower than the original code." 2. @shivaram "Its hard to profile these things closely enough to figure out how much overhead it adds" 3. @srowen "There is a tiny risk of breaking something or regressing performance." 4. @shivaram "I don't think the improvement in readability is worth the risk in terms of performance regression" I however think the points above are the exact reasons why you could accept this and the following changes to DAGScheduler. We should *not* be afraid of making changes to the core. We've got no evidence the points above hold and that's what worries me the most. Thanks for your help anyway! It was worth to have sent the PR to learn from you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16250: [CORE][MINOR] Stylistic changes in DAGScheduler (...
Github user jaceklaskowski closed the pull request at: https://github.com/apache/spark/pull/16250 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16250: [CORE][MINOR] Stylistic changes in DAGScheduler (to ease...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16250 Thanks @srowen for the review! I do understand your point and remember you and @rxin have always been telling me that I should not touch code unless there's a need for a change. But the more I'm with the code the more I think there'd be more contributions if the code were even slightly more readable. I'm still having troubles getting the gist of it (but am closer). Spark in general is very tricky to get the hang of and when the code is unnecessarily complex (like doing traversing, filtering and branching in a convoluted way) the more functional bits could certainly help the code, me and the community. That's my hope. That's also why I'm sending very small changes to get myself going with more ease with the code (and get more comfortable with what's acceptable). All in all, I don't think I'm ready for heavier contributions yet, but I do think I'm closer and will soon be. Your help and patience have helped a lot. Thanks (and don't worry if you have to reject my changes). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16250: [CORE][MINOR] Stylistic changes in DAGScheduler (...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/16250#discussion_r91860470 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -333,16 +333,16 @@ class DAGScheduler( // (so we don't unnecessarily re-compute that data). val serLocs = mapOutputTracker.getSerializedMapOutputStatuses(shuffleDep.shuffleId) val locs = MapOutputTracker.deserializeMapStatuses(serLocs) - (0 until locs.length).foreach { i => -if (locs(i) ne null) { - // locs(i) will be null if missing - stage.addOutputLoc(i, locs(i)) + locs.zipWithIndex +.filter { case (status, _) => status != null } // null if missing +.foreach { case (ms, idx) => + stage.addOutputLoc(idx, ms) } - } } else { // Kind of ugly: need to register RDDs with the cache and map output tracker here // since we can't do it in the RDD constructor because # of partitions is unknown - logInfo("Registering RDD " + rdd.id + " (" + rdd.getCreationSite + ")") + logInfo(s"Registering RDD ${rdd.id} (partitions: ${rdd.partitions.length}, " + --- End diff -- Yes, but it was because I think it's important at this "stage". Do you think I should remove the partition-related addition? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16250: [CORE][MINOR] Stylistic changes in DAGScheduler (...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/16250 [CORE][MINOR] Stylistic changes in DAGScheduler (to ease comprehensio⦠## What changes were proposed in this pull request? Stylistic changes in `DAGScheduler` to ease comprehension thereof. ## How was this patch tested? Local build. Awaiting Jenkins to run the entire test suite. â¦n thereof) You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark dagscheduler-minors Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16250.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16250 commit bc39671cf6868f033b69179cbdc6e6c373904149 Author: Jacek Laskowski <ja...@japila.pl> Date: 2016-12-11T15:00:52Z [CORE][MINOR] Stylistic changes in DAGScheduler (to ease comprehension thereof) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16145: [MINOR][CORE][SQL] Remove explicit RDD and Partition ove...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16145 @rxin @srowen Please re-review and act accordingly. Thanks a lot for your help so far! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16145: [MINOR][CORE][SQL] Remove explicit RDD and Partit...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/16145 [MINOR][CORE][SQL] Remove explicit RDD and Partition overrides ## What changes were proposed in this pull request? I **believe** that I _only_ removed duplicated code (that adds nothing but noise). I'm gonna remove the comment after Jenkins has built the changes with no issues and Spark devs has agreed to include the changes. Remove explicit `RDD` and `Partition` overrides (that turn out code duplication) ## How was this patch tested? Local build. Awaiting Jenkins. â¦cation) You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark rdd-overrides-removed Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16145.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16145 commit a33b6ecc20f5255a199ec0ab27ebd55868fcde96 Author: Jacek Laskowski <ja...@japila.pl> Date: 2016-12-05T09:58:07Z [MINOR][CORE][SQL] Remove explicit RDD and Partition overrides (duplication) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16144: [MINOR][CORE][SQL][DOCS] Typo fixes
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/16144 [MINOR][CORE][SQL][DOCS] Typo fixes ## What changes were proposed in this pull request? Typo fixes ## How was this patch tested? Local build. Awaiting the official build. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark typo-fixes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16144.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16144 commit 49418e425ce6338d23e379b879ce4daf991b38dd Author: Jacek Laskowski <ja...@japila.pl> Date: 2016-12-05T09:46:20Z [MINOR][CORE][SQL][DOCS] Typo fixes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15758: [MINOR] Remove calculation of bucket spec when not expec...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/15758 Thanks @rxin for looking into it. While reviewing that code I noticed the call and thought I'd push it for review here since...`getBucketSpec` is superfluous given `assertNotBucketed("save")` at the very beginning. Correct me if I'm wrong (that will further help me understand the code). I don't want to push the change for this one single line and wouldn't mind if you could merge it with some other more important change(s). Do whatever is best for the code. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15758: [MINOR] Remove calculation of bucket spec when no...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/15758 [MINOR] Remove calculation of bucket spec when not expected ## What changes were proposed in this pull request? Remove calculation of bucket spec when asserted not to have one earlier. ## How was this patch tested? Local build. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark remove-getBucketSpec Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15758.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15758 commit 2162bfbf28a844273fa29de95580eec6b709be8b Author: Jacek Laskowski <ja...@japila.pl> Date: 2016-11-03T20:21:30Z [MINOR] Remove calculation of bucket spec when not expected --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15688: [SPARK-18173][SQL] data source tables should supp...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/15688#discussion_r85665836 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1636,15 +1635,23 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { } withTable("rectangles", "rectangles2") { --- End diff -- Mind changing `rectangles2` to `rectPartitioned` or similar? Think the test would get simpler to read (later). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15603: [WEBUI][MINOR] Remove unused appUIAddress in SparkUI + c...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/15603 Anything else you'd change/remove/add, Mr @srowen? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15603: [WEBUI][MINOR] Remove unused appUIAddress in SparkUI + c...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/15603 The test failure doesn't seem to be due to my change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15603: [WEBUI][MINOR] Return types in methods + cleanup
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/15603#discussion_r85638393 --- Diff: core/src/test/scala/org/apache/spark/ui/UISuite.scala --- @@ -179,16 +179,15 @@ class UISuite extends SparkFunSuite { test("verify appUIAddress contains the scheme") { --- End diff -- What do you think about renaming `spark.driver.appUIAddress` to `spark.driver.webUrl`? Not necessarily important, but merely for the sake of consistency... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15603: [WEBUI][MINOR] Return types in methods + cleanup
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/15603#discussion_r85638347 --- Diff: core/src/test/scala/org/apache/spark/ui/UISuite.scala --- @@ -179,16 +179,15 @@ class UISuite extends SparkFunSuite { test("verify appUIAddress contains the scheme") { --- End diff -- Thanks Sean! How do you manage to find so much time to help people like me?!?! You're certainly not a robot -- I do know it since I met you few times and can confirm ;-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15603: [WEBUI][MINOR] Return types in methods + cleanup
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/15603#discussion_r85516045 --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala --- @@ -135,7 +133,7 @@ private[spark] class SparkUI private ( private[spark] abstract class SparkUITab(parent: SparkUI, prefix: String) extends WebUITab(parent, prefix) { - def appName: String = parent.getAppName + def appName: String = parent.appName --- End diff -- I might have missed why we could remove it. Mind refreshing my memory @srowen (sorry getting too old :)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15603: [WEBUI][MINOR] Return types in methods + cleanup
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/15603#discussion_r84653954 --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala --- @@ -135,7 +133,7 @@ private[spark] class SparkUI private ( private[spark] abstract class SparkUITab(parent: SparkUI, prefix: String) extends WebUITab(parent, prefix) { - def appName: String = parent.getAppName + def appName: String = parent.appName --- End diff -- Yes! That's how I imagined the conversation would go. Thanks Sean! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15603: [WEBUI][MINOR] Return types in methods + cleanup
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/15603#discussion_r84653856 --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala --- @@ -93,15 +91,15 @@ private[spark] class SparkUI private ( /** Stop the server behind this web interface. Only valid after bind(). */ override def stop() { super.stop() -logInfo("Stopped Spark web UI at %s".format(appUIAddress)) +logInfo(s"Stopped Spark web UI at $appUIAddress") } /** * Return the application UI host:port. This does not include the scheme (http://). */ private[spark] def appUIHostPort = publicHostName + ":" + boundPort - private[spark] def appUIAddress = s"http://$appUIHostPort; + private[spark] def appUIAddress = webUrl --- End diff -- Thanks Sean for confirming my thoughts! That's exactly the feedback I expected to get from you (or other Spark committers). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15603: [WEBUI][MINOR] Return types in methods + cleanup
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/15603 The build has just proved that my thinking was correct. I think I'll propose few other (more agressive) changes to clean the code up a little bit more. I'd appreciate any comments (or acceptance). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15603: [WEBUI][MINOR] Return types in methods + cleanup
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/15603 [WEBUI][MINOR] Return types in methods + cleanup ## What changes were proposed in this pull request? The main purpose of the change is to discuss the purpose of `SparkUITab` class and `appUIAddress` attribute since they seem to introduce nothing new. Once agreed they're not needed I'm going to remove the comment. Code cleanup ## How was this patch tested? Local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark sparkui-fixes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15603.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15603 commit 6fcd247f546c2cfe15f6909bf8a9d75dc822ef15 Author: Jacek Laskowski <ja...@japila.pl> Date: 2016-10-23T19:45:21Z [WEBUI][MINOR] Return types in methods + cleanup --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15353: [SPARK-17724][WebUI][Streaming] Unevaluated new lines in...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/15353 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/15257#discussion_r80858277 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -52,13 +53,51 @@ object Literal { case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), TimestampType) case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType) case a: Array[Byte] => Literal(a, BinaryType) +case a: Array[_] => + val elementType = componentTypeToDataType(a.getClass.getComponentType()) + val dataType = ArrayType(elementType) + val convert = CatalystTypeConverters.createToCatalystConverter(dataType) + Literal(convert(a), dataType) case i: CalendarInterval => Literal(i, CalendarIntervalType) case null => Literal(null, NullType) case v: Literal => v case _ => throw new RuntimeException("Unsupported literal type " + v.getClass + " " + v) } + private def componentTypeToDataType(clz: Class[_]): DataType = clz match { --- End diff -- I can't find either, but thought I'd ask. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15268: [SPARK-17598] [SQL] [Web UI] User-friendly name for Spar...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/15268 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/15257#discussion_r80663732 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -52,13 +53,51 @@ object Literal { case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), TimestampType) case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType) case a: Array[Byte] => Literal(a, BinaryType) +case a: Array[_] => + val elementType = componentTypeToDataType(a.getClass.getComponentType()) + val dataType = ArrayType(elementType) + val convert = CatalystTypeConverters.createToCatalystConverter(dataType) + Literal(convert(a), dataType) case i: CalendarInterval => Literal(i, CalendarIntervalType) case null => Literal(null, NullType) case v: Literal => v case _ => throw new RuntimeException("Unsupported literal type " + v.getClass + " " + v) } + private def componentTypeToDataType(clz: Class[_]): DataType = clz match { --- End diff -- It looks so similar to the other cases where Spark has to map from Scala types to Spark SQL's, like https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L65. When comparing them I noticed that `CalendarInterval` is not included in your list. Why? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14808 @sethah I seem to have missed the other comment with the notes what has to be done to make the PR an example for the change. Sorry. Since I'm very new to it and the only way to learn it better is to help with the PR here or the one that's coming, I'd like to be engaged one way or the other. I'm gonna close this PR (to make it consistent JIRA-wise) and open another with the changes you've mentioned to have the example aligned with the changes and the requirements. Thanks @sethah for your help! See you in the other PR... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logisti...
Github user jaceklaskowski closed the pull request at: https://github.com/apache/spark/pull/14808 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14808 @sethah Anything to improve the example. I'm open for suggestions and improve the example (to learn that part better). Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14808 Thanks @MLnick. What would you suggest with the PR then (after the comments from @sethah)? Please guide. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14958: [SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14958 Fair enough. It was just an idea that would make the upgrade even more super-needed. The issues that snappy has fixed could manifest in various ways in Spark and while doing so could uncover others. Yes, again, it could be a project on its own. Sorry for the noise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14958: [SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14958 Would be nice to have a test in Spark to see if the changes to snappy made any sense to Spark (not that I'm against -- just as a safety measure that it did improve things if possible). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14891: [SQL][DOC][MINOR] Add (Scala-specific) and (Java-specifi...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14891 @srowen, anyone you'd recommend to accept the PR (after you accepted)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14808 Any news on that @sethah? I've seen some discussions about the changes to unify the interfaces, but am wondering how close the other PRs are so I could help with that one myself. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14891: [SQL][DOC][MINOR] Add (Scala-specific) and (Java-specifi...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14891 Frankly, I'd not have bothered with the changes if I had not seen them elsewhere. See http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset and look for `foreach` and `foreachPartition` for (Java-specific) while http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$ and `javaSerialization` / `kryo` for (Scala-specific). I have seen examples with both in one source like http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.RelationalGroupedDataset. That was the main reason while the other was to understand why there are two different variants of the same method (and then it clicked but only when I'd seen the other examples). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14891: [SQL][DOC][MINOR] Add (Scala-specific) and (Java-...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/14891 [SQL][DOC][MINOR] Add (Scala-specific) and (Java-specific) ## What changes were proposed in this pull request? Adds (Scala-specific) and (Java-specific) to Scaladoc. ## How was this patch tested? local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark scala-specifics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14891.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14891 commit a927c83888e93742ab41db435df8db8f0065037e Author: Jacek Laskowski <ja...@japila.pl> Date: 2016-08-31T07:26:15Z [SQL][DOC][MINOR] Add (Scala-specific) and (Java-specific) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14808 I don't mind adapting the example to the upcoming changes and I can work on it. But.. SPARK-17163 has no Fix Version/s field assigned so I'm reading it that it's not clear whether the change gets into the next release or not. Given 2.0.1 might be out in a couple of weeks I believe accepting the example now and changing it later (once SPARK-17163 gets merged) is a viable strategy. What do you think? /cc @srowen @wangmiao1981 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logisti...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/14808 [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regression Scala Example ## What changes were proposed in this pull request? New example for `MultinomialLogisticRegression` ML algorithm. ## How was this patch tested? local build + run-example You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark SPARK-17156-multiclass-logreg-example Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14808.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14808 commit ba5a4e2cc14e253ea3465d887c3cf5a2a9d82a80 Author: Jacek Laskowski <ja...@japila.pl> Date: 2016-08-25T14:47:18Z [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regression Scala Example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14771: [SPARK-17199] Use CatalystConf.resolver for case-...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/14771 [SPARK-17199] Use CatalystConf.resolver for case-sensitivity comparison ## What changes were proposed in this pull request? Use `CatalystConf.resolver` consistently for case-sensitivity comparison (removed dups). ## How was this patch tested? Local build. Waiting for Jenkins to ensure clean build and test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark 17199-catalystconf-resolver Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14771.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14771 commit 7d86c578848cc4597924a14956ef5bae0b81fef2 Author: Jacek Laskowski <ja...@japila.pl> Date: 2016-08-23T08:15:10Z [SPARK-17199] Use CatalystConf.resolver for case-sensitivity comparison --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14722: [SPARK-13286] [SQL] add the next expression of SQLExcept...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14722 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14720: SPARK-12868: Allow Add jar to add jars from hdfs/...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14720#discussion_r75601795 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala --- @@ -865,6 +865,16 @@ class HiveQuerySuite extends HiveComparisonTest with BeforeAndAfter { sql("DROP TABLE alter1") } + test("SPARK-12868 ADD JAR FROM HDFS") { +val testJar = "hdfs://nn:8020/foo.jar" +// This should fail with unknown host, as its just testing the URL parsing +// before SPARK-12868 it was failing with Malformed URI +val e = intercept[RuntimeException] { + sql(s"ADD JAR $testJar") +} +assert(e.getMessage.contains("java.net.UnknownHostException: nn1")) --- End diff -- You sure it's `nn1` (with `1`)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14720: SPARK-12868: Allow Add jar to add jars from hdfs/...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14720#discussion_r75601779 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -87,6 +88,9 @@ private[hive] class HiveClientImpl( // Circular buffer to hold what hive prints to STDOUT and ERR. Only printed when failures occur. private val outputBuffer = new CircularBuffer() + // An object lock to ensure URL factory is registered exactly once. + object URLFactoryRegistrationLock{} --- End diff -- Why do you need `{}`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14637: [WIP] [SPARK-16967] move mesos to module
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14637#discussion_r75574292 --- Diff: project/SparkBuild.scala --- @@ -56,9 +56,9 @@ object BuildCommons { "tags", "sketch" ).map(ProjectRef(buildLocation, _)) ++ sqlProjects ++ streamingProjects - val optionallyEnabledProjects@Seq(yarn, java8Tests, sparkGangliaLgpl, + val optionallyEnabledProjects@Seq(mesos, yarn, java8Tests, sparkGangliaLgpl, --- End diff -- This is a pattern matching on assignment feature in Scala and the line creates one `optionallyEnabledProjects` value for the entire `Seq` of projects with respective projects being their own values `mesos` etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14680: [SPARK-17101][SQL] Provide consistent format identifiers...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14680 Thanks @HyukjinKwon You're helping me a lot! I'll work on the unit test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14680: [SPARK-17101][SQL] Provide consistent format identifiers...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14680 How about now @HyukjinKwon ? The more I look at it the more I think it should calculated automatically out of the class name when constructor's called. It's of little to no value to a FileFormat developer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14680: [SPARK-17101][SQL] Provide consistent format identifiers...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14680 @rxin @HyukjinKwon Mind reviewing it again and letting me know what you think? I know it's minor but would greatly appreciate having it merged at your earliest convenience. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14680: [SPARK-17101][SQL] Provide format identifier for ...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14680#discussion_r75182485 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala --- @@ -40,6 +40,8 @@ class TextFileFormat extends TextBasedFileFormat with DataSourceRegister { override def shortName(): String = "text" + override def toString: String = shortName.toUpperCase --- End diff -- Let me propose a change for Parquet. Thanks for spotting it and your review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14681: [SPARK-17038][Streaming] fix metrics retrieval so...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14681#discussion_r75074428 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/ui/StreamingJobProgressListenerSuite.scala --- @@ -68,6 +68,7 @@ class StreamingJobProgressListenerSuite extends TestSuiteBase with Matchers { listener.waitingBatches should be (List(BatchUIData(batchInfoSubmitted))) listener.runningBatches should be (Nil) listener.retainedCompletedBatches should be (Nil) +listener.lastReceivedBatch should be (Some(BatchUIData(batchInfoSubmitted))) --- End diff -- What do you think about `should contain (BatchUIData(batchInfoSubmitted))` instead? See http://www.scalatest.org/user_guide/using_matchers#workingWithContainers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14680: [SPARK-17101][SQL] Provide format identifier for ...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14680#discussion_r75073488 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala --- @@ -40,6 +40,8 @@ class TextFileFormat extends TextBasedFileFormat with DataSourceRegister { override def shortName(): String = "text" + override def toString: String = shortName.toUpperCase --- End diff -- I did see it but I'd do the opposite and rather fix JSON and CSV than repeat what's in `shortName`. I even thought about defining the `toString` in a supertype, but could find nothing that would be acceptable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14680: [SPARK-17101][SQL] Provide format identifier for ...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/14680 [SPARK-17101][SQL] Provide format identifier for TextFileFormat ## What changes were proposed in this pull request? Define the format identifier that is used in Optimized Logical Plan in explain for text file format (following CSV and JSON formats). ``` scala> spark.read.text("people.csv").cache.explain(extended = true) == Parsed Logical Plan == Relation[value#0] text == Analyzed Logical Plan == value: string Relation[value#0] text == Optimized Logical Plan == InMemoryRelation [value#0], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan text [value#0] Batched: false, Format: TEXT, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct == Physical Plan == InMemoryTableScan [value#0] +- InMemoryRelation [value#0], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan text [value#0] Batched: false, Format: TEXT, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct ``` ## How was this patch tested? Local build. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark SPARK-17101 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14680.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14680 commit 133e5deff497ce1be92497344bb7e0d4e7d57c21 Author: Jacek Laskowski <ja...@japila.pl> Date: 2016-08-17T06:43:34Z [SPARK-17101][SQL] Provide format identifier for TextFileFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14557 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14637: [WIP] [SPARK-16967] move mesos to module
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14637#discussion_r74706568 --- Diff: mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterManager.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import org.apache.spark.{SparkContext, SparkException} +import org.apache.spark.scheduler.{ExternalClusterManager, SchedulerBackend, TaskScheduler, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.mesos.{MesosCoarseGrainedSchedulerBackend, MesosFineGrainedSchedulerBackend} + +/** + * Cluster Manager for creation of Yarn scheduler and backend + */ +private[spark] class MesosClusterManager extends ExternalClusterManager { + private val MESOS_REGEX = """mesos://(.*)""".r + + override def canCreate(masterURL: String): Boolean = { +masterURL.startsWith("mesos") + } + + override def createTaskScheduler(sc: SparkContext, masterURL: String): TaskScheduler = { +new TaskSchedulerImpl(sc) + } + + override def createSchedulerBackend(sc: SparkContext, + masterURL: String, + scheduler: TaskScheduler): SchedulerBackend = { +val mesosUrl = MESOS_REGEX.findFirstMatchIn(masterURL).get.group(1) +val coarse = sc.conf.getBoolean("spark.mesos.coarse", defaultValue = true) --- End diff -- Oh, I thought Spark supports coarse-grain scheduling only (even for Mesos that had fine-grained one too). Will it stay or is this a temporary thing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14637: [WIP] [SPARK-16967] move mesos to module
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14637#discussion_r74706552 --- Diff: mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterManager.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import org.apache.spark.{SparkContext, SparkException} +import org.apache.spark.scheduler.{ExternalClusterManager, SchedulerBackend, TaskScheduler, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.mesos.{MesosCoarseGrainedSchedulerBackend, MesosFineGrainedSchedulerBackend} + +/** + * Cluster Manager for creation of Yarn scheduler and backend + */ +private[spark] class MesosClusterManager extends ExternalClusterManager { + private val MESOS_REGEX = """mesos://(.*)""".r + + override def canCreate(masterURL: String): Boolean = { +masterURL.startsWith("mesos") + } + + override def createTaskScheduler(sc: SparkContext, masterURL: String): TaskScheduler = { +new TaskSchedulerImpl(sc) + } + + override def createSchedulerBackend(sc: SparkContext, + masterURL: String, --- End diff -- I think the convention is to inline 4 spaces only not to align to the first parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14637: [WIP] [SPARK-16967] move mesos to module
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14637#discussion_r74706517 --- Diff: mesos/pom.xml --- @@ -0,0 +1,167 @@ + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + 4.0.0 + +org.apache.spark +spark-parent_2.11 +2.1.0-SNAPSHOT +../pom.xml + + + spark-mesos_2.11 + jar + Spark Project Mesos + +mesos +1.0.0 +shaded-protobuf + + + + + org.apache.spark + spark-core_${scala.binary.version} + ${project.version} + + + org.apache.spark + spark-core_${scala.binary.version} + ${project.version} + test-jar + test + + + org.apache.mesos + mesos + ${mesos.version} + ${mesos.classifier} + + + com.google.protobuf + protobuf-java + + + + + +
[GitHub] spark pull request #14637: [WIP] [SPARK-16967] move mesos to module
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14637#discussion_r74706522 --- Diff: mesos/pom.xml --- @@ -0,0 +1,167 @@ + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + 4.0.0 + +org.apache.spark +spark-parent_2.11 +2.1.0-SNAPSHOT +../pom.xml + + + spark-mesos_2.11 + jar + Spark Project Mesos + +mesos +1.0.0 +shaded-protobuf + + + + + org.apache.spark + spark-core_${scala.binary.version} + ${project.version} + + + org.apache.spark + spark-core_${scala.binary.version} + ${project.version} + test-jar + test + + + org.apache.mesos + mesos + ${mesos.version} + ${mesos.classifier} + + + com.google.protobuf + protobuf-java + + + + + + + + + org.mockito + mockito-core + test + +
[GitHub] spark pull request #14637: [WIP] [SPARK-16967] move mesos to module
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14637#discussion_r74706502 --- Diff: mesos/pom.xml --- @@ -0,0 +1,167 @@ + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> --- End diff -- Isn't the line too long? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14637: [WIP] [SPARK-16967] move mesos to module
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14637#discussion_r74706494 --- Diff: mesos/pom.xml --- @@ -0,0 +1,167 @@ + +
[GitHub] spark pull request #14576: [SPARK-16391][SQL] ReduceAggregator and partial a...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14576#discussion_r74706380 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.expressions + +import org.apache.spark.annotation.Experimental +import org.apache.spark.sql.Encoder +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder + +/** + * :: Experimental :: + * An aggregator that uses a single associative and commutative reduce function. This reduce + * function can be used to go through all input values and reduces them to a single value. + * If there is no input, a null value is returned. + * + * @since 2.1.0 + */ +@Experimental +abstract class ReduceAggregator[T] extends Aggregator[T, (Boolean, T), T] { + + // Question 1: Should func and encoder be parameters rather than abstract methods? + // rxin: abstract method has better java compatibility and forces naming the concrete impl, + // whereas parameter has better type inference (infer encoders via context bounds). + // Question 2: Should finish throw an exception or return null if there is no input? + // rxin: null might be more "SQL" like, whereas exception is more Scala like. + + /** + * A associative and commutative reduce function. + * @since 2.1.0 + */ + def func(a: T, b: T): T + + /** + * Encoder for type T. + * @since 2.1.0 + */ + def encoder: ExpressionEncoder[T] + + override def zero: (Boolean, T) = (false, null.asInstanceOf[T]) + + override def bufferEncoder: Encoder[(Boolean, T)] = +ExpressionEncoder.tuple(ExpressionEncoder[Boolean](), encoder) --- End diff -- `Encoders.scalaBoolean`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14636: [SPARK-17053][SQL] Support `hive.exec.drop.ignorenonexis...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14636 The JIRA issue (https://issues.apache.org/jira/browse/SPARK-17053) is closed as Won't Fix. Should the PR be closed too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14561: [SPARK-16972][CORE] Move DriverEndpoint out of CoarseGra...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14561 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14561: [SPARK-16972][CORE] Move DriverEndpoint out of CoarseGra...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14561 Finally someone took the responsibility and is clearing that important and critical path in Spark Core. Whenever I see the code, I feel what @lshmouse felt -- it has to be refactored to invite further contributions. Kudos @lshmouse! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14557: [SPARK-16709][CORE] Kill the running task if stag...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14557#discussion_r74688576 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -798,6 +798,19 @@ private[spark] class TaskSetManager( } } maybeFinishTaskSet() + +// kill running task if stage failed +if(reason.isInstanceOf[FetchFailed]) { + killTasks(runningTasksSet, taskInfos) +} + } + + def killTasks(tasks: HashSet[Long], taskInfo: HashMap[Long, TaskInfo]): Boolean = { +tasks.foreach { task => + val executorId = taskInfo(task).executorId + sched.sc.schedulerBackend.killTask(task, executorId, true) --- End diff -- I think `true` would look nicer with the name of the parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14557: [SPARK-16709][CORE] Kill the running task if stag...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14557#discussion_r74688557 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -798,6 +798,19 @@ private[spark] class TaskSetManager( } } maybeFinishTaskSet() + +// kill running task if stage failed +if(reason.isInstanceOf[FetchFailed]) { + killTasks(runningTasksSet, taskInfos) +} + } + + def killTasks(tasks: HashSet[Long], taskInfo: HashMap[Long, TaskInfo]): Boolean = { +tasks.foreach { task => + val executorId = taskInfo(task).executorId + sched.sc.schedulerBackend.killTask(task, executorId, true) +} +true --- End diff -- Why are you returning `true` if you don't use it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14557: [SPARK-16709][CORE] Kill the running task if stag...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14557#discussion_r74688550 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -798,6 +798,19 @@ private[spark] class TaskSetManager( } } maybeFinishTaskSet() + +// kill running task if stage failed +if(reason.isInstanceOf[FetchFailed]) { --- End diff -- A space between `if` and `(`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14524: [SPARK-16832] [ML] [WIP] CrossValidator and Train...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14524#discussion_r74688310 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala --- @@ -67,7 +67,7 @@ private[shared] object SharedParamsCodeGen { isValid = "ParamValidators.inArray(Array(\"skip\", \"error\"))"), ParamDesc[Boolean]("standardization", "whether to standardize the training features" + " before fitting the model", Some("true")), - ParamDesc[Long]("seed", "random seed", Some("this.getClass.getName.hashCode.toLong")), + ParamDesc[Long]("seed", "random seed", Some("new java.util.Random().nextLong()")), --- End diff -- Ah, learnt this today. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14524: [SPARK-16832] [ML] [WIP] CrossValidator and Train...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14524#discussion_r74680486 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala --- @@ -67,7 +67,7 @@ private[shared] object SharedParamsCodeGen { isValid = "ParamValidators.inArray(Array(\"skip\", \"error\"))"), ParamDesc[Boolean]("standardization", "whether to standardize the training features" + " before fitting the model", Some("true")), - ParamDesc[Long]("seed", "random seed", Some("this.getClass.getName.hashCode.toLong")), + ParamDesc[Long]("seed", "random seed", Some("new java.util.Random().nextLong()")), --- End diff -- Is this a string on purpose? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14533: [SPARK-16606] [CORE] Misleading warning for SparkContext...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/14533 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14450: [SPARK-16847][SQL] Prevent to potentially read co...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14450#discussion_r73511032 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java --- @@ -204,7 +205,8 @@ protected void initialize(String path, List columns) throws IOException } } this.sparkSchema = new ParquetSchemaConverter(config).convert(requestedSchema); -this.reader = new ParquetFileReader(config, file, blocks, requestedSchema.getColumns()); +this.reader = new ParquetFileReader( +config, footer.getFileMetaData(), file, blocks, requestedSchema.getColumns()); for (BlockMetaData block : blocks) { --- End diff -- Same here. BTW, code duplication? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14450: [SPARK-16847][SQL] Prevent to potentially read co...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14450#discussion_r73510962 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java --- @@ -140,7 +140,8 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptCont String sparkRequestedSchemaString = configuration.get(ParquetReadSupport$.MODULE$.SPARK_ROW_REQUESTED_SCHEMA()); this.sparkSchema = StructType$.MODULE$.fromString(sparkRequestedSchemaString); -this.reader = new ParquetFileReader(configuration, file, blocks, requestedSchema.getColumns()); +this.reader = new ParquetFileReader( +configuration, footer.getFileMetaData(), file, blocks, requestedSchema.getColumns()); for (BlockMetaData block : blocks) { --- End diff -- While we're at it, are the spaces around `:` necessary? (it's been a while since I developed in Java so I might be wrong). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14451: [SPARK-16848][SQL] Make jdbc() and read.format("j...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14451#discussion_r73339896 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -228,6 +228,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { table: String, parts: Array[Partition], connectionProperties: Properties): DataFrame = { +require(userSpecifiedSchema.isEmpty, "jdbc does not allow user-specified schemas.") --- End diff -- Just `schema` (singular not plural) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r73337483 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala --- @@ -218,9 +221,44 @@ class AnalysisSuite extends AnalysisTest { // CreateStruct is a special case that we should not trim Alias for it. plan = testRelation.select(CreateStruct(Seq(a, (a + 1).as("a+1"))).as("col")) -checkAnalysis(plan, plan) +expected = testRelation.select(CreateNamedStruct(Seq( + Literal(a.name), a, + Literal("a+1"),(a + 1))).as("col")) +checkAnalysis(plan, expected) plan = testRelation.select(CreateStructUnsafe(Seq(a, (a + 1).as("a+1"))).as("col")) -checkAnalysis(plan, plan) +expected = testRelation.select(CreateNamedStructUnsafe(Seq( + Literal(a.name), a, + Literal("a+1"),(a + 1))).as("col")) +checkAnalysis(plan, expected) + } + + test("Analysis may leave unnecassary aliases") { +val att1 = testRelation.output.head +var plan = testRelation.select( + CreateStructUnsafe(Seq(att1, ((att1 as "aa") + 1).as("a_plus_1"))).as("col"), + att1 +) +val prevPlan = getAnalyzer(true).execute(plan) +plan = prevPlan.select(CreateArray(Seq( + CreateStructUnsafe(Seq(att1, (att1 + 1).as("a_plus_1"))), + /** alias should be eliminated by [[CleanupAliases]] */ + "col".attr as "col2" +)) as "arr") +plan = getAnalyzer(true).execute(plan) + +plan should be (a[Project]) +val Project( projectExpressions, _) = plan +projectExpressions should have (size(1)) +val Seq(expr1) = projectExpressions +expr1 should be (a[Alias]) +val Alias(expr2, arrAlias) = expr1 +arrAlias shouldBe "arr" +expr2 should be (a[CreateArray]) +val CreateArray(arrElements) = expr2 +arrElements should have (size(2)) +val Seq(el1, el2) = arrElements +el1 should not be a[Alias] --- End diff -- Why no `(` since you've been using them in `should be`'s earlier? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r73337339 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala --- @@ -218,9 +221,44 @@ class AnalysisSuite extends AnalysisTest { // CreateStruct is a special case that we should not trim Alias for it. plan = testRelation.select(CreateStruct(Seq(a, (a + 1).as("a+1"))).as("col")) -checkAnalysis(plan, plan) +expected = testRelation.select(CreateNamedStruct(Seq( + Literal(a.name), a, + Literal("a+1"),(a + 1))).as("col")) +checkAnalysis(plan, expected) plan = testRelation.select(CreateStructUnsafe(Seq(a, (a + 1).as("a+1"))).as("col")) -checkAnalysis(plan, plan) +expected = testRelation.select(CreateNamedStructUnsafe(Seq( + Literal(a.name), a, + Literal("a+1"),(a + 1))).as("col")) +checkAnalysis(plan, expected) + } + + test("Analysis may leave unnecassary aliases") { +val att1 = testRelation.output.head +var plan = testRelation.select( + CreateStructUnsafe(Seq(att1, ((att1 as "aa") + 1).as("a_plus_1"))).as("col"), + att1 +) +val prevPlan = getAnalyzer(true).execute(plan) +plan = prevPlan.select(CreateArray(Seq( + CreateStructUnsafe(Seq(att1, (att1 + 1).as("a_plus_1"))), + /** alias should be eliminated by [[CleanupAliases]] */ + "col".attr as "col2" +)) as "arr") +plan = getAnalyzer(true).execute(plan) + +plan should be (a[Project]) +val Project( projectExpressions, _) = plan +projectExpressions should have (size(1)) +val Seq(expr1) = projectExpressions +expr1 should be (a[Alias]) --- End diff -- Don't test it here since it would break in the line below. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r73337237 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala --- @@ -218,9 +221,44 @@ class AnalysisSuite extends AnalysisTest { // CreateStruct is a special case that we should not trim Alias for it. plan = testRelation.select(CreateStruct(Seq(a, (a + 1).as("a+1"))).as("col")) -checkAnalysis(plan, plan) +expected = testRelation.select(CreateNamedStruct(Seq( + Literal(a.name), a, + Literal("a+1"),(a + 1))).as("col")) +checkAnalysis(plan, expected) plan = testRelation.select(CreateStructUnsafe(Seq(a, (a + 1).as("a+1"))).as("col")) -checkAnalysis(plan, plan) +expected = testRelation.select(CreateNamedStructUnsafe(Seq( + Literal(a.name), a, + Literal("a+1"),(a + 1))).as("col")) +checkAnalysis(plan, expected) + } + + test("Analysis may leave unnecassary aliases") { +val att1 = testRelation.output.head +var plan = testRelation.select( + CreateStructUnsafe(Seq(att1, ((att1 as "aa") + 1).as("a_plus_1"))).as("col"), + att1 +) +val prevPlan = getAnalyzer(true).execute(plan) +plan = prevPlan.select(CreateArray(Seq( + CreateStructUnsafe(Seq(att1, (att1 + 1).as("a_plus_1"))), + /** alias should be eliminated by [[CleanupAliases]] */ + "col".attr as "col2" +)) as "arr") +plan = getAnalyzer(true).execute(plan) + +plan should be (a[Project]) +val Project( projectExpressions, _) = plan --- End diff -- Space? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r73337186 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala --- @@ -218,9 +221,44 @@ class AnalysisSuite extends AnalysisTest { // CreateStruct is a special case that we should not trim Alias for it. plan = testRelation.select(CreateStruct(Seq(a, (a + 1).as("a+1"))).as("col")) -checkAnalysis(plan, plan) +expected = testRelation.select(CreateNamedStruct(Seq( + Literal(a.name), a, + Literal("a+1"),(a + 1))).as("col")) --- End diff -- I think, in this context, ` as "col"` would be acceptable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r73337136 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala --- @@ -218,9 +221,44 @@ class AnalysisSuite extends AnalysisTest { // CreateStruct is a special case that we should not trim Alias for it. plan = testRelation.select(CreateStruct(Seq(a, (a + 1).as("a+1"))).as("col")) -checkAnalysis(plan, plan) +expected = testRelation.select(CreateNamedStruct(Seq( + Literal(a.name), a, + Literal("a+1"),(a + 1))).as("col")) +checkAnalysis(plan, expected) plan = testRelation.select(CreateStructUnsafe(Seq(a, (a + 1).as("a+1"))).as("col")) -checkAnalysis(plan, plan) +expected = testRelation.select(CreateNamedStructUnsafe(Seq( + Literal(a.name), a, + Literal("a+1"),(a + 1))).as("col")) --- End diff -- Missing space. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r73337027 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2109,15 +2119,12 @@ object CleanupAliases extends Rule[LogicalPlan] { case a: AppendColumns => a case other => - var stop = false other transformExpressionsDown { -case c: CreateStruct if !stop => - stop = true - c.copy(children = c.children.map(trimNonTopLevelAliases)) -case c: CreateStructUnsafe if !stop => - stop = true - c.copy(children = c.children.map(trimNonTopLevelAliases)) -case Alias(child, _) if !stop => child +case c @ CreateStruct(children) => + CreateNamedStruct( mkNamedStructArgs(c.dataType, children)) +case c @ CreateStructUnsafe(children) => + CreateNamedStructUnsafe( mkNamedStructArgs(c.dataType, children)) --- End diff -- And here, too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r73337012 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2109,15 +2119,12 @@ object CleanupAliases extends Rule[LogicalPlan] { case a: AppendColumns => a case other => - var stop = false other transformExpressionsDown { -case c: CreateStruct if !stop => - stop = true - c.copy(children = c.children.map(trimNonTopLevelAliases)) -case c: CreateStructUnsafe if !stop => - stop = true - c.copy(children = c.children.map(trimNonTopLevelAliases)) -case Alias(child, _) if !stop => child +case c @ CreateStruct(children) => + CreateNamedStruct( mkNamedStructArgs(c.dataType, children)) --- End diff -- Mind the space --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r73337074 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala --- @@ -25,7 +27,8 @@ import org.apache.spark.sql.catalyst.plans.Inner import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.types._ -class AnalysisSuite extends AnalysisTest { + +class AnalysisSuite extends AnalysisTest with ShouldMatchers{ --- End diff -- Mind no space here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r73336967 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2065,18 +2065,28 @@ object EliminateUnions extends Rule[LogicalPlan] { */ object CleanupAliases extends Rule[LogicalPlan] { private def trimAliases(e: Expression): Expression = { -var stop = false e.transformDown { - // CreateStruct is a special case, we need to retain its top level Aliases as they decide the - // name of StructField. We also need to stop transform down this expression, or the Aliases - // under CreateStruct will be mistakenly trimmed. - case c: CreateStruct if !stop => -stop = true -c.copy(children = c.children.map(trimNonTopLevelAliases)) - case c: CreateStructUnsafe if !stop => -stop = true -c.copy(children = c.children.map(trimNonTopLevelAliases)) - case Alias(child, _) if !stop => child + /** +* [[CreateStruct]] is a special case as it uses its top level Aliases to decide the +* name of StructField. +* we tackle this by replacing [[CreateStruct]] with [[CreateNamedStruct]] +* which encodes the field names as child literals. +*/ + case c @ CreateStruct(children) => +CreateNamedStruct( mkNamedStructArgs(c.dataType, children)) + case c @ CreateStructUnsafe(children) => +CreateNamedStructUnsafe( mkNamedStructArgs(c.dataType, children)) + case Alias(child, _) => child +} + } + + private def mkNamedStructArgs( structType : StructType, attributeExpressions: Seq[Expression]) = { +for { + (name, expression) <- structType.fieldNames.zip(attributeExpressions) + nameLiteral = Literal(name) + newChild <- Seq(nameLiteral, expression) +} yield{ + newChild --- End diff -- Do we really need `{` around `newChild`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r73334242 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -739,6 +931,15 @@ case class Sample( case class Distinct(child: LogicalPlan) extends UnaryNode { override def maxRows: Option[Long] = child.maxRows override def output: Seq[Attribute] = child.output + + override def sql: String = child match { +case Union(children) => + val childrenSql = children.map(c => s"(${c.sql})") + childrenSql.mkString(" UNION DISTINCT ") + +case _: Project => --- End diff -- Replace `_` to `p` and use it instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r73334167 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -731,6 +909,20 @@ case class Sample( } override protected def otherCopyArgs: Seq[AnyRef] = isTableSample :: Nil + + override def sql: String = child match { +case SubqueryAlias(alias, _: NonSQLPlan) => + val repeatable = if (withReplacement) s" REPEATABLE ($seed)" else "" + s"$alias TABLESAMPLE(${upperBound * 100} PERCENT)$repeatable" + +case SubqueryAlias(alias, grandChild) => + val repeatable = if (withReplacement) s" REPEATABLE ($seed)" else "" + s"${grandChild.sql} TABLESAMPLE(${upperBound * 100} PERCENT)$repeatable $alias" + +case _ => + val repeatable = if (withReplacement) s" REPEATABLE ($seed)" else "" --- End diff -- `repeatable` repeated three times (and don't think it's on purpose despite the name :)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r73334011 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -692,11 +864,17 @@ case class LocalLimit(limitExpr: Expression, child: LogicalPlan) extends UnaryNo } child.statistics.copy(sizeInBytes = sizeInBytes) } + + override def sql: String = child.sql } case class SubqueryAlias(alias: String, child: LogicalPlan) extends UnaryNode { override def output: Seq[Attribute] = child.output.map(_.withQualifier(Some(alias))) + + override def sql: String = child match { +case _ => if (child.sql.equals(alias)) child.sql else s"(${child.sql}) AS $alias" --- End diff -- Why do you pattern match here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r7845 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -495,6 +573,92 @@ case class Aggregate( super.statistics } } + + private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean = +output1.size == output2.size && + output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2)) + + private def isGroupingSet(e: Expand, p: Project) = { --- End diff -- Ah, so you are using `{` to wrap boolean one-liners :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r7741 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -495,6 +573,92 @@ case class Aggregate( super.statistics } } + + private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean = +output1.size == output2.size && + output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2)) --- End diff -- Think `forall { case (left, right) => left semanticEquals right }` (or with dots) could be more readable. WDYT? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r7585 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -495,6 +573,92 @@ case class Aggregate( super.statistics } } + + private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean = --- End diff -- Wrap it around `{` and `}`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r7461 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -167,6 +212,8 @@ case class Intersect(left: LogicalPlan, right: LogicalPlan) extends SetOperation Statistics(sizeInBytes = sizeInBytes, isBroadcastable = isBroadcastable) } + + override def sql: String = s"(${left.sql}) INTERSECT (${right.sql})" --- End diff -- I think `INTERSECT` et al should be a value to later limit typos (I think there are tests that test that the output contains `INTERSET`, right?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r7286 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -112,6 +152,11 @@ case class Filter(condition: Expression, child: LogicalPlan) .filterNot(SubqueryExpression.hasCorrelatedSubquery) child.constraints.union(predicates.toSet) } + + override def sql: String = child match { +case _: Aggregate => s"${child.sql} HAVING ${condition.sql}" +case _ => s"${child.sql} WHERE ${condition.sql}" --- End diff -- The only difference is `HAVING` vs `WHERE`, right? Mind extracting the small difference out and doing the following instead: ``` val havingOrWhere = ??? s"${child.sql} $havingOrWhere ${condition.sql}" ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r73332988 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -99,6 +133,12 @@ case class Generate( if (join) child.output ++ qualified else qualified } + + override def sql: String = { +val columnAliases = generatorOutput.map(_.sql).mkString(", ") +s"${child.sql} LATERAL VIEW ${if (outer) "OUTER" else ""} " + --- End diff -- Could you move `${if...` outside the interpolated string? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r73332573 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -53,6 +53,40 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan) extend override def validConstraints: Set[Expression] = child.constraints.union(getAliasedConstraints(projectList)) + + override def sql: String = { +if (projectList.exists(expr => expr.find(e => e.isInstanceOf[NonSQLExpression]).isDefined)) { + throw new UnsupportedOperationException("NonSQLExpression") --- End diff -- Would `assert` be applicable here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r73332522 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -53,6 +53,40 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan) extend override def validConstraints: Set[Expression] = child.constraints.union(getAliasedConstraints(projectList)) + + override def sql: String = { +if (projectList.exists(expr => expr.find(e => e.isInstanceOf[NonSQLExpression]).isDefined)) { --- End diff -- This `if` is overly complex and forces a reader to read the expression inside-out (not left to right). I can't propose anything better than beg for a boolean val with the entire boolean condition on a separate line. WDYT? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14410: [SPARK-16803] [SQL] SaveAsTable does not work whe...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14410#discussion_r73051607 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -1457,6 +1457,59 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } + test("saveAsTable(CTAS) when the source DataFrame is built on a Hive table") { +val tableName = "tab1" +withTable(tableName) { + sql(s"CREATE TABLE $tableName stored as SEQUENCEFILE as select 1 as key, 'abc' as value") + val df = sql(s"select key, value as value from $tableName") + + val message = intercept[AnalysisException] { +df.write.mode("append").saveAsTable(tableName) + }.getMessage + assert(message.contains( +"saveAsTable() for Hive tables is not supported yet. " + + "Please use insertInto() as an alternative")) +} + } + + test("saveAsTable(CTAS) and insertInto when the source DataFrame is built on Data Source") { +val tableName = "tab1" +withTable(tableName) { + val schema = StructType( +StructField("key", IntegerType, nullable = false) :: + StructField("value", IntegerType, nullable = true) :: Nil) + val row = Row(3, 4) + val df = spark.createDataFrame(sparkContext.parallelize(row :: Nil), schema) + + df.write.format("json").mode("overwrite").saveAsTable(tableName) + df.write.format("json").mode("append").saveAsTable(tableName) + checkAnswer( +sql(s"SELECT key, value FROM $tableName"), +Row(3, 4) :: Row(3, 4) :: Nil + ) + + (1 to 2).map { i => (i, i) }.toDF("key", "value").write.insertInto(tableName) + checkAnswer( +sql(s"SELECT key, value FROM $tableName"), +Row(1, 1) :: Row(2, 2) :: Row(3, 4) :: Row(3, 4) :: Nil + ) +} + } + + test("insertInto when the source DataFrame is built on a Hive table") { +val tableName = "tab1" +withTable(tableName) { + sql(s"CREATE TABLE $tableName stored as SEQUENCEFILE as select 1 as key, 'abc' as value") + val df = sql(s"select key, value as value from $tableName") --- End diff -- Is there a reason why you're using lowercase variant of the query in `sql` while `CREATE TABLE` above is uppercase? It seems inconsistent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14410: [SPARK-16803] [SQL] SaveAsTable does not work whe...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14410#discussion_r73035297 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -1457,6 +1457,59 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } + test("saveAsTable(CTAS) when the source DataFrame is built on a Hive table") { +val tableName = "tab1" +withTable(tableName) { + sql(s"CREATE TABLE $tableName stored as SEQUENCEFILE as select 1 as key, 'abc' as value") + val df = sql(s"select key, value as value from $tableName") + + val message = intercept[AnalysisException] { +df.write.mode("append").saveAsTable(tableName) --- End diff -- and here too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14410: [SPARK-16803] [SQL] SaveAsTable does not work whe...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14410#discussion_r73035235 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala --- @@ -449,6 +449,22 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be } } + test("saveAsTable(CTAS) when the source DataFrame is built on a SimpleCatalog table") { +import testImplicits._ + +val tableName = "tab1" +withTable(tableName) { + sql(s"CREATE TABLE $tableName(key int, value int) USING parquet") + (1 to 2).map { i => (i, i) }.toDF("key", "value").write.insertInto(tableName) + val df = sql(s"select key, value from $tableName") + df.write.mode("append").saveAsTable(tableName) --- End diff -- What do you think about using `SaveMode.Append` instead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14410: [SPARK-16803] [SQL] SaveAsTable does not work whe...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14410#discussion_r73034863 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -236,6 +236,11 @@ case class CreateDataSourceTableAsSelectCommand( existingSchema = Some(l.schema) case s: SimpleCatalogRelation if DDLUtils.isDatasourceTable(s.metadata) => existingSchema = Some(DDLUtils.getSchemaFromTableProperties(s.metadata)) +// When the source table is a Hive table, the `saveAsTable` API of DataFrameWriter +// does not work. Instead, use the `insertInto` API. +case o if o.getClass.getName == "org.apache.spark.sql.hive.MetastoreRelation" => + throw new AnalysisException("saveAsTable() for Hive tables is not supported yet. " + --- End diff -- Remove "yet" or point to a JIRA issue where it's going to be introduced. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14410: [SPARK-16803] [SQL] SaveAsTable does not work whe...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14410#discussion_r73034796 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -236,6 +236,11 @@ case class CreateDataSourceTableAsSelectCommand( existingSchema = Some(l.schema) case s: SimpleCatalogRelation if DDLUtils.isDatasourceTable(s.metadata) => existingSchema = Some(DDLUtils.getSchemaFromTableProperties(s.metadata)) +// When the source table is a Hive table, the `saveAsTable` API of DataFrameWriter +// does not work. Instead, use the `insertInto` API. +case o if o.getClass.getName == "org.apache.spark.sql.hive.MetastoreRelation" => --- End diff -- `case o: MetastoreRelation` doesn't work? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14414: [SPARK-16809] enable history server links in disp...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14414#discussion_r73033311 --- Diff: docs/running-on-mesos.md --- @@ -468,6 +468,17 @@ See the [configuration page](configuration.html) for information on Spark config If unset it will point to Spark's internal web UI. + + spark.mesos.dispatcher.historyServer.url + (none) + +Set the URL of the http://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact;>history +server>. The dispatcher will then link each driver to its entry --- End diff -- Looks like the ending tag is mistyped. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14414: [SPARK-16809] enable history server links in disp...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14414#discussion_r73033008 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -152,8 +152,13 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( sc.sparkUser, sc.appName, sc.conf, - sc.conf.getOption("spark.mesos.driver.webui.url").orElse(sc.ui.map(_.appUIAddress)) + sc.conf.getOption("spark.mesos.driver.webui.url").orElse(sc.ui.map(_.appUIAddress)), + Option.empty, + Option.empty, + sc.conf.getOption("spark.mesos.driver.frameworkId") --- End diff -- Mind introducing a constant for `spark.mesos.driver.frameworkId`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14414: [SPARK-16809] enable history server links in disp...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14414#discussion_r73032952 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -152,8 +152,13 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( sc.sparkUser, sc.appName, sc.conf, - sc.conf.getOption("spark.mesos.driver.webui.url").orElse(sc.ui.map(_.appUIAddress)) + sc.conf.getOption("spark.mesos.driver.webui.url").orElse(sc.ui.map(_.appUIAddress)), + Option.empty, --- End diff -- `None`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org