[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

2016-05-18 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/13181#issuecomment-220198280 hmmm, this might be failing tests? @HyukjinKwon can you investigate if it fails again? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

2016-05-18 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/13181#issuecomment-220195260 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-18 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12855#issuecomment-220193521 No worries! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14463][SQL] Document the semantics for ...

2016-05-18 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/13184#issuecomment-220192761 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-18 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12855#issuecomment-220192667 Sure, I thought you could reopen PRs you created, but if not feel free to create a new one and link. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-14463][SQL] Document the semantics for ...

2016-05-18 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/13184#discussion_r63801142 --- Diff: R/pkg/R/SQLContext.R --- @@ -298,6 +298,8 @@ parquetFile <- function(sqlContext, ...) { #' Create a SparkDataFrame from a text f

[GitHub] spark pull request: [SPARK-15323][SPARK-14463][SQL] Fix reading of...

2016-05-18 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/13104#discussion_r63798411 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -457,7 +457,8 @@ class DataFrameReader private[sql](sparkSession

[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

2016-05-18 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/13181#issuecomment-220183153 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

2016-05-18 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/13181 Revert "[SPARK-10216][SQL] Avoid creating empty files during overwrit… This reverts commit 8d05a7a from #8411, which seems to have caused regressions when working with empty DataFrames.

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-18 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12855#issuecomment-220177425 I'm going to revert this until we figure out the issues @HyukjinKwon can you reopen? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-15112][SQL] Allows query plan schema an...

2016-05-18 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12952#issuecomment-220116608 @liancheng, thanks for the thorough explanation. I think I understand the problem now and I agree with the proposed constraints. A few questions: >

[GitHub] spark pull request: [SPARK-15375][SQL][Streaming] Add ConsoleSink ...

2016-05-18 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/13162#issuecomment-219943826 I agree this would be useful for debugging. Can we plug it in as a `StreamSinkProvider` instead of hard coding an `if` into the `DataSource` resolver? --- If your

[GitHub] spark pull request: [SPARK-15323] Fix reading of partitioned forma...

2016-05-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/13104#discussion_r63584539 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala --- @@ -65,6 +65,14 @@ class TextSuite extends QueryTest

[GitHub] spark pull request: [SPARK-15323] Fix reading of partitioned forma...

2016-05-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/13104#discussion_r63584448 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala --- @@ -65,6 +65,14 @@ class TextSuite extends QueryTest

spark git commit: [SPARK-10216][SQL] Avoid creating empty files during overwriting with group by query

2016-05-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-2.0 adc1c2685 -> af37bdd3a [SPARK-10216][SQL] Avoid creating empty files during overwriting with group by query ## What changes were proposed in this pull request? Currently, `INSERT INTO` with `GROUP BY` query tries to make at least 200

spark git commit: [SPARK-10216][SQL] Avoid creating empty files during overwriting with group by query

2016-05-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 20a89478e -> 8d05a7a98 [SPARK-10216][SQL] Avoid creating empty files during overwriting with group by query ## What changes were proposed in this pull request? Currently, `INSERT INTO` with `GROUP BY` query tries to make at least 200

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-17 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12855#issuecomment-219797464 Thanks, merging to master and 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-6320][SQL] Move planLater method into G...

2016-05-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/13147#discussion_r63567938 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/QueryPlanner.scala --- @@ -27,6 +27,16 @@ import

[GitHub] spark pull request: [SPARK-6320][SQL] Move planLater method into G...

2016-05-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/13147#discussion_r63567867 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ExperimentalMethods.scala --- @@ -42,7 +43,7 @@ class ExperimentalMethods private[sql

[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

2016-05-16 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/13138#issuecomment-219593532 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

2016-05-16 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/13138#discussion_r63399810 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala --- @@ -37,6 +37,11 @@ class GenericArrayData(val array

[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

2016-05-16 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/13138#discussion_r63399671 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala --- @@ -185,6 +185,20 @@ class RowEncoderSuite extends

[GitHub] spark pull request: [SPARK-14942][SQL][Streaming] Reduce delay bet...

2016-05-13 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12725#discussion_r63246142 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala --- @@ -137,20 +137,88 @@ class StreamSuite extends StreamTest

[GitHub] spark pull request: Fix reading of partitioned format=text dataset...

2016-05-13 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/13104#issuecomment-219146749 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-15264][SPARK-15274][SQL] CSV Reader Err...

2016-05-11 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/13041#issuecomment-218651518 That's okay, is good to be cautious around release time. I just wanted to be clear why I thought merging in this case was justified :) --- If your project is set

[GitHub] spark pull request: [SPARK-15264][SPARK-15274][SQL] CSV Reader Err...

2016-05-11 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/13041#issuecomment-218642602 @HyukjinKwon Spark 2.0 has not been released yet and thus CSV has never been in a released version of Spark. Why would you want to break compatibility in Spark 2.0

[GitHub] spark pull request: [SPARK-15183][Streaming] Adding outputMode to ...

2016-05-10 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12958#issuecomment-218225663 Trigger clock is an internal details for testing that should not be exposed to users. Also, this isn't really what the output mode is for. Try reading the design

[GitHub] spark pull request: [SPARK-15112][SQL] Allows query plan schema an...

2016-05-09 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12952#issuecomment-217942423 > Seems that Spark 2.0 implicitly assumes that Dataset.resolvedTEncoder.schema is consistent with Dataset.logicalPlan.schema Can you explain more? Which p

[GitHub] spark pull request: [SPARK-15109][SQL] Accept Dataset[_] in joins

2016-05-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12886#discussion_r62078478 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -931,8 +931,8 @@ object functions { * @group normal_funcs

[GitHub] spark pull request: [SPARK-15103][SQL] Refactored FileCatalog clas...

2016-05-04 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12879#issuecomment-216935209 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-15103][SQL] Refactored FileCatalog clas...

2016-05-03 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12879#issuecomment-216684080 /cc @liancheng @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-15103][SQL] Refactored FileCatalog clas...

2016-05-03 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12879#discussion_r61967643 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -365,11 +390,78 @@ class HDFSFileCatalog

[GitHub] spark pull request: [SPARK-11962] WIP: Added `attempt` and `getOpt...

2016-05-03 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12708#discussion_r61924540 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala --- @@ -335,6 +358,19 @@ trait Row extends Serializable { def getAs[T](fieldName

spark git commit: [SPARK-15077][SQL] Use a fair lock to avoid thread starvation in StreamExecution

2016-05-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 0fd95be3c -> 4e3685ae5 [SPARK-15077][SQL] Use a fair lock to avoid thread starvation in StreamExecution ## What changes were proposed in this pull request? Right now `StreamExecution.awaitBatchLock` uses an unfair lock.

spark git commit: [SPARK-15077][SQL] Use a fair lock to avoid thread starvation in StreamExecution

2016-05-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-2.0 733cbaa3c -> dcce0aaaf [SPARK-15077][SQL] Use a fair lock to avoid thread starvation in StreamExecution ## What changes were proposed in this pull request? Right now `StreamExecution.awaitBatchLock` uses an unfair lock.

[GitHub] spark pull request: [SPARK-15077][SQL]Use a fair lock to avoid thr...

2016-05-02 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12852#issuecomment-216411544 Thanks, merging to master and 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

spark git commit: [SPARK-15062][SQL] fix list type infer serializer issue

2016-05-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-2.0 86167968f -> 733cbaa3c [SPARK-15062][SQL] fix list type infer serializer issue ## What changes were proposed in this pull request? Make serializer correctly inferred if the input type is `List[_]`, since `List[_]` is type of

spark git commit: [SPARK-15062][SQL] fix list type infer serializer issue

2016-05-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 1c19c2769 -> 0fd95be3c [SPARK-15062][SQL] fix list type infer serializer issue ## What changes were proposed in this pull request? Make serializer correctly inferred if the input type is `List[_]`, since `List[_]` is type of `Seq[_]`,

[GitHub] spark pull request: [SPARK-15062] [SQL] fix list type infer serial...

2016-05-02 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12849#issuecomment-216410605 Thanks, merging to master and 2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

spark git commit: [SPARK-14747][SQL] Add assertStreaming/assertNoneStreaming checks in DataFrameWriter

2016-05-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-2.0 fbc73f731 -> 65b94f460 [SPARK-14747][SQL] Add assertStreaming/assertNoneStreaming checks in DataFrameWriter ## Problem If an end user happens to write code mixed with continuous-query-oriented methods and

[GitHub] spark pull request: [SPARK-14747][SQL] Add assertStreaming/assertN...

2016-05-02 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12521#issuecomment-216397719 Thanks, merging to master and 2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

spark git commit: [SPARK-14747][SQL] Add assertStreaming/assertNoneStreaming checks in DataFrameWriter

2016-05-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master f362363d1 -> 35d9c8aa6 [SPARK-14747][SQL] Add assertStreaming/assertNoneStreaming checks in DataFrameWriter ## Problem If an end user happens to write code mixed with continuous-query-oriented methods and non-continuous-query-oriented

[GitHub] spark pull request: [SPARK-11962] WIP: Added `attempt` and `getOpt...

2016-05-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12708#discussion_r61822276 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala --- @@ -335,6 +358,19 @@ trait Row extends Serializable { def getAs[T](fieldName

spark git commit: [SPARK-14830][SQL] Add RemoveRepetitionFromGroupExpressions optimizer.

2016-05-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master a35a67a83 -> 6e6320122 [SPARK-14830][SQL] Add RemoveRepetitionFromGroupExpressions optimizer. ## What changes were proposed in this pull request? This PR aims to optimize GroupExpressions by removing repeating expressions.

spark git commit: [SPARK-14830][SQL] Add RemoveRepetitionFromGroupExpressions optimizer.

2016-05-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-2.0 1c2082b64 -> 972fd22e3 [SPARK-14830][SQL] Add RemoveRepetitionFromGroupExpressions optimizer. ## What changes were proposed in this pull request? This PR aims to optimize GroupExpressions by removing repeating expressions.

[GitHub] spark pull request: [SPARK-14830][SQL] Add RemoveRepetitionFromGro...

2016-05-02 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12590#issuecomment-216339896 Thanks, merging to master and 2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-11962] WIP: Added `attempt` and `getOpt...

2016-05-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12708#discussion_r61790515 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala --- @@ -335,6 +358,19 @@ trait Row extends Serializable { def getAs[T](fieldName

spark git commit: [SPARK-14579][SQL] Fix the race condition in StreamExecution.processAllAvailable again

2016-05-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-2.0 08ae32e61 -> 1c2082b64 [SPARK-14579][SQL] Fix the race condition in StreamExecution.processAllAvailable again ## What changes were proposed in this pull request? #12339 didn't fix the race condition. MemorySinkSuite is still flaky:

[GitHub] spark pull request: [SPARK-14579][SQL]Fix the race condition in St...

2016-05-02 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12582#issuecomment-216319642 Thanks, merging to master and 2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-11962] WIP: Added `attempt` and `getOpt...

2016-05-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12708#discussion_r61779384 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala --- @@ -335,6 +358,19 @@ trait Row extends Serializable { def getAs[T](fieldName

[GitHub] spark pull request: [SPARK-15022][SPARK-15023] Add support for tes...

2016-05-02 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12797#issuecomment-216309861 Some minor comments about code understandability, but overall this looks good. Thanks for working on this! --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-15022][SPARK-15023] Add support for tes...

2016-05-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12797#discussion_r61774969 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/ProcessingTimeExecutorSuite.scala --- @@ -21,19 +21,41 @@ import

[GitHub] spark pull request: [SPARK-15022][SPARK-15023] Add support for tes...

2016-05-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12797#discussion_r61774572 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/ProcessingTimeExecutorSuite.scala --- @@ -21,19 +21,41 @@ import

[GitHub] spark pull request: [SPARK-15022][SPARK-15023] Add support for tes...

2016-05-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12797#discussion_r61774365 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StreamTest.scala --- @@ -280,19 +283,35 @@ trait StreamTest extends QueryTest with Timeouts

[GitHub] spark pull request: [SPARK-15022][SPARK-15023] Add support for tes...

2016-05-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12797#discussion_r61773617 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TriggerExecutor.scala --- @@ -65,8 +65,22 @@ case class ProcessingTimeExecutor

[GitHub] spark pull request: [SPARK-15022][SPARK-15023] Add support for tes...

2016-05-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12797#discussion_r61771197 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TriggerExecutor.scala --- @@ -65,8 +65,22 @@ case class ProcessingTimeExecutor

[GitHub] spark pull request: [SPARK-14830][SQL] Add RemoveRepetitionFromGro...

2016-05-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12590#discussion_r61770196 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1440,6 +1441,18 @@ object

spark git commit: [SPARK-14637][SQL] object expressions cleanup

2016-05-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-2.0 ccb53a20e -> 1145ea01b [SPARK-14637][SQL] object expressions cleanup ## What changes were proposed in this pull request? Simplify and clean up some object expressions: 1. simplify the logic to handle `propagateNull` 2. add

spark git commit: [SPARK-14637][SQL] object expressions cleanup

2016-05-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 214d1be4f -> 0513c3ac9 [SPARK-14637][SQL] object expressions cleanup ## What changes were proposed in this pull request? Simplify and clean up some object expressions: 1. simplify the logic to handle `propagateNull` 2. add

[GitHub] spark pull request: [SPARK-14637][SQL] object expressions cleanup

2016-05-02 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12399#issuecomment-216300544 Thanks, merging to master and 2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14637][SQL] object expressions cleanup

2016-04-29 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12399#issuecomment-215892595 Sorry this went stale. Do we still want to try and get this in? --- If your project is set up for it, you can reply to this email and have your reply appear

spark git commit: [SPARK-14981][SQL] Throws exception if DESC is specified for sorting columns

2016-04-29 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 8ebae466a -> a04b1de5f [SPARK-14981][SQL] Throws exception if DESC is specified for sorting columns ## What changes were proposed in this pull request? Currently Spark SQL doesn't support sorting columns in descending order. However, the

[GitHub] spark pull request: [SPARK-14981][SQL] Throws exception if DESC is...

2016-04-29 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12759#issuecomment-215891359 Thanks, merging to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

spark git commit: [SPARK-14970][SQL] Prevent DataSource from enumerates all files in a directory if there is user specified schema

2016-04-28 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master d5ab42ceb -> 0ee5419b6 [SPARK-14970][SQL] Prevent DataSource from enumerates all files in a directory if there is user specified schema ## What changes were proposed in this pull request? The FileCatalog object gets created even if the

[GitHub] spark pull request: [SPARK-14970][SQL] Prevent DataSource from enu...

2016-04-28 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12748#issuecomment-215530907 We might still disable schema inference, but until then this LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear

spark git commit: [SPARK-14874][SQL][STREAMING] Remove the obsolete Batch representation

2016-04-27 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 7dd01d9c0 -> a234cc614 [SPARK-14874][SQL][STREAMING] Remove the obsolete Batch representation ## What changes were proposed in this pull request? The `Batch` class, which had been used to indicate progress in a stream, was abandoned by

[GitHub] spark pull request: [SPARK-14874][SQL][Streaming] Remove the obsol...

2016-04-27 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12638#issuecomment-215159163 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14942][SQL][Streaming] First construct ...

2016-04-27 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12725#issuecomment-215139594 This makes sense. Thanks for writing a very clear description! Perhaps a better title would be "Reduce delay between batch construction and exec

[GitHub] spark pull request: [SPARK-14930][SPARK-13693] Fix race condition ...

2016-04-26 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12712#issuecomment-214931564 LGTM, thanks for fixing this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14913][SQL] Simplify configuration API

2016-04-26 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12689#issuecomment-214907032 LGTM, much cleaner --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14874][SQL][Streaming] Remove the obsol...

2016-04-25 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12638#issuecomment-214597468 To be clear, if there's a completely unused class, I think it's worth the time to delete it (dead code is confusing for people trying to learn the code base

[GitHub] spark pull request: [SPARK-14747][SQL] Add assertStreaming/assertN...

2016-04-25 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12521#issuecomment-214482001 /cc @tdas --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14747][SQL] Add assertStreaming/assertN...

2016-04-25 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12521#discussion_r60969585 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/DataFrameReaderWriterSuite.scala --- @@ -368,4 +368,79 @@ class

[GitHub] spark pull request: [SPARK-14874][SQL][Streaming] Remove the obsol...

2016-04-25 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12638#issuecomment-214460285 It's fine to remove the class, but lets avoid unneeded renaming. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-14874][SQL][Streaming] Remove the obsol...

2016-04-25 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12638#discussion_r60957003 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala --- @@ -47,7 +47,7 @@ class FileStreamSink( private

[GitHub] spark pull request: [SPARK-14837][SQL][STREAMING] Added support in...

2016-04-22 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12616#issuecomment-213567878 I.e. something like `/dir/*/*`. We either do it in DataSource or HDFSFileCatalog. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-14838][SQL] Implement statistics in Ser...

2016-04-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12599#discussion_r60789861 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -83,6 +83,28 @@ case class SerializeFromObject

[GitHub] spark pull request: [SPARK-14837][SQL][STREAMING] Added support in...

2016-04-22 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12616#issuecomment-213559435 Why not just support globbing like we do in batch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-14828][SQL] Start SparkSession in REPL ...

2016-04-21 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12589#issuecomment-213184305 Yeah, if there are problems we don't have to do that, but `val df = spark.read.json(...)` is pretty nice --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-14579][SQL]Fix the race condition in St...

2016-04-21 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12582#issuecomment-213160121 You can add the test and mark it `@Ignore`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14796][SQL] Add spark.sql.optimizer.min...

2016-04-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12562#discussion_r60629326 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -54,10 +54,16 @@ object SQLConf { val

[GitHub] spark pull request: [SPARK-14796][SQL] Add spark.sql.optimizer.min...

2016-04-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12562#discussion_r60629117 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeInSuite.scala --- @@ -36,7 +39,7 @@ class OptimizeInSuite extends

[GitHub] spark pull request: [SPARK-14796][SQL] Add spark.sql.optimizer.min...

2016-04-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12562#discussion_r60628971 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeInSuite.scala --- @@ -17,11 +17,14 @@ package

spark git commit: [SPARK-14678][SQL] Add a file sink log to support versioning and compaction

2016-04-20 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 296c384af -> 7bc948557 [SPARK-14678][SQL] Add a file sink log to support versioning and compaction ## What changes were proposed in this pull request? This PR adds a special log for FileStreamSink for two purposes: - Versioning. A future

[GitHub] spark pull request: [SPARK-14678][SQL]Add a file sink log to suppo...

2016-04-20 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12435#issuecomment-212592916 Thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-20 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12259#discussion_r60477354 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -112,6 +113,11 @@ class SQLContext private[sql]( */ def

spark git commit: [SPARK-14741][SQL] Fixed error in reading json file stream inside a partitioned directory

2016-04-20 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master acc7e592c -> cb8ea9e1f [SPARK-14741][SQL] Fixed error in reading json file stream inside a partitioned directory ## What changes were proposed in this pull request? Consider the following directory structure dir/col=X/some-files If we

[GitHub] spark pull request: [SPARK-14741][SQL] Fixed error in reading json...

2016-04-20 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12517#issuecomment-212567863 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14749][SQL, Tests] PlannerSuite failed ...

2016-04-20 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12532#issuecomment-212553919 OK to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

spark git commit: [SPARK-14555] First cut of Python API for Structured Streaming

2016-04-20 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 834277884 -> 80bf48f43 [SPARK-14555] First cut of Python API for Structured Streaming ## What changes were proposed in this pull request? This patch provides a first cut of python APIs for structured streaming. This PR provides the new

[GitHub] spark pull request: [SPARK-14555] First cut of Python API for Stru...

2016-04-20 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12320#issuecomment-212526768 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-13929] Use Scala reflection for UDTs

2016-04-19 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12149#issuecomment-212184479 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14716][SQL] Added support for partition...

2016-04-19 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12409#issuecomment-212154746 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14716][SQL] Added support for partition...

2016-04-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12409#discussion_r60322519 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -477,6 +491,20 @@ trait FileFormat

[GitHub] spark pull request: [SPARK-13929] Use Scala reflection for UDTs

2016-04-19 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12149#issuecomment-212150759 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14678][SQL]Add a file sink log to suppo...

2016-04-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12435#discussion_r60320810 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamFileCatalog.scala --- @@ -54,6 +54,6 @@ class StreamFileCatalog

[GitHub] spark pull request: [SPARK-14720][SPARK-13643][WIP] Remove HiveCon...

2016-04-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12485#discussion_r60315064 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -89,4 +89,11 @@ package object config { .stringConf

[GitHub] spark pull request: [SPARK-14720][SPARK-13643][WIP] Remove HiveCon...

2016-04-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12485#discussion_r60314954 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-14678][SQL]Add a file sink log to suppo...

2016-04-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12435#discussion_r60304589 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLog.scala --- @@ -0,0 +1,262 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-14678][SQL]Add a file sink log to suppo...

2016-04-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12435#discussion_r60302757 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamFileCatalog.scala --- @@ -54,6 +54,6 @@ class StreamFileCatalog

<    4   5   6   7   8   9   10   11   12   13   >