[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98777503 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/State.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98780383 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala --- @@ -144,6 +145,12 @@ object ObjectOperator { (i: InternalRow

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98779599 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -235,3 +240,86 @@ case class StateStoreSaveExec

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98779015 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala --- @@ -90,6 +93,14 @@ class IncrementalExecution

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98775825 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -219,6 +219,160 @@ class KeyValueGroupedDataset[K, V] private

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98780164 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -184,7 +189,7 @@ case class StateStoreSaveExec

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98778823 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/StateImpl.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98779264 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -54,6 +55,18 @@ trait StatefulOperator extends

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98777585 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/State.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98779142 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -54,6 +55,18 @@ trait StatefulOperator extends

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98777860 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/State.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98778773 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/StateImpl.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98776628 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -219,6 +219,160 @@ class KeyValueGroupedDataset[K, V] private

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98777095 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -219,6 +219,160 @@ class KeyValueGroupedDataset[K, V] private

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98776799 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -219,6 +219,160 @@ class KeyValueGroupedDataset[K, V] private

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98776538 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -219,6 +219,160 @@ class KeyValueGroupedDataset[K, V] private

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98778649 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/StateImpl.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98777806 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/State.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98777964 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/State.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #16745: [SPARK-19406] [SQL] Fix function to_json to respect user...

2017-01-30 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16745 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL progra...

2017-01-24 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16329 Sorry for the delay. This LGTM, but I'm currently away from my Apache SSH keys. Other committers should feel free to merge if you get there before I do. --- If your project is set up

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-01-23 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16664 /cc @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16564: [SPARK-19065][SS]Rewrite Alias in StreamExecution if nec...

2017-01-12 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16564 Hmm, I'm not sure that I agree with the solution from #15427. I do not think that it should be valid to have to different expressions that have the same expression id. There are many case where

[GitHub] spark pull request #16553: [SPARK-9435][SQL] Reuse function in Java UDF to c...

2017-01-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16553#discussion_r95655634 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -488,219 +488,241 @@ class UDFRegistration private[sql

[GitHub] spark pull request #16240: [SPARK-16792][SQL] Dataset containing a Case Clas...

2017-01-05 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16240#discussion_r94888245 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala --- @@ -99,33 +96,96 @@ abstract class SQLImplicits { // Seqs

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-05 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16240 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-05 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16240 For future reference: https://github.com/apache/spark/blob/master/dev/mima (script to run mima) --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-28 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16371 +1 I think we can move forward. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16322: [SPARK-18908][SS] Creating StreamingQueryException shoul...

2016-12-21 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16322 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16360: [SPARK-18234][SS] Made update mode public

2016-12-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16360#discussion_r93508333 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -219,7 +221,13 @@ final class DataStreamWriter[T] private

[GitHub] spark pull request #16360: [SPARK-18234][SS] Made update mode public

2016-12-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16360#discussion_r93507397 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalOutputModes.scala --- @@ -15,7 +15,7 @@ * limitations under the License

[GitHub] spark pull request #16360: [SPARK-18234][SS] Made update mode public

2016-12-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16360#discussion_r93509028 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -219,7 +221,13 @@ final class DataStreamWriter[T] private

[GitHub] spark pull request #16304: [SPARK-18894][SS] Fix event time watermark delay ...

2016-12-20 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16304#discussion_r93355210 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -387,7 +387,7 @@ class StreamExecution

[GitHub] spark pull request #16322: [SPARK-18908][SS] Creating StreamingQueryExceptio...

2016-12-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16322#discussion_r93121938 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -206,6 +201,36 @@ class StreamExecution

[GitHub] spark issue #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL progra...

2016-12-19 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16329 This is great! Thanks for taking the time to write up such complete examples. I think this was a big gap in the existing docs. One other ask. The screen-shot is great, but I'd like to see

[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2016-12-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16329#discussion_r93119785 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedTypedAggregation.scala --- @@ -0,0 +1,82 @@ +/* + * Licensed

[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2016-12-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16329#discussion_r93121147 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedTypedAggregation.scala --- @@ -0,0 +1,82 @@ +/* + * Licensed

[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2016-12-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16329#discussion_r93118975 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaUserDefinedTypedAggregation.java --- @@ -0,0 +1,154 @@ +/* + * Licensed

[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2016-12-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16329#discussion_r93118905 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaUserDefinedTypedAggregation.java --- @@ -0,0 +1,154 @@ +/* + * Licensed

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-19 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16240 /cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16322: [SPARK-18908][SS] Creating StreamingQueryExceptio...

2016-12-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16322#discussion_r93113059 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -206,6 +201,36 @@ class StreamExecution

[GitHub] spark issue #16304: [SPARK-18894][SS] Fix event time watermark delay thresho...

2016-12-16 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16304 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16304: [SPARK-18894][SS] Fix event time watermark delay ...

2016-12-16 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16304#discussion_r92904102 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala --- @@ -124,6 +137,29 @@ class WatermarkSuite extends

[GitHub] spark pull request #16304: [SPARK-18894][SS] Fix event time watermark delay ...

2016-12-16 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16304#discussion_r92902755 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala --- @@ -124,6 +137,29 @@ class WatermarkSuite extends

[GitHub] spark pull request #16304: [SPARK-18894][SS] Disallow went time watermark de...

2016-12-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16304#discussion_r92753969 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -572,6 +572,10 @@ class Dataset[T] private[sql]( val parsedDelay

[GitHub] spark pull request #16304: [SPARK-18894][SS] Disallow went time watermark de...

2016-12-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16304#discussion_r92753590 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -572,6 +572,10 @@ class Dataset[T] private[sql]( val parsedDelay

[GitHub] spark issue #16289: [SPARK-18870] Disallowed Distinct Aggregations on Stream...

2016-12-15 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16289 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16258: [SPARK-18834][SS] Expose event time stats through Stream...

2016-12-13 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16258 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16258: [SPARK-18834][SS] Expose event time and processin...

2016-12-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16258#discussion_r92077840 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -33,27 +34,6 @@ import

[GitHub] spark pull request #16258: [SPARK-18834][SS] Expose event time and processin...

2016-12-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16258#discussion_r92073271 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala --- @@ -38,13 +38,18 @@ class

[GitHub] spark pull request #16258: [SPARK-18834][SS] Expose event time and processin...

2016-12-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16258#discussion_r92071196 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -360,6 +360,24 @@ class StreamExecution

[GitHub] spark pull request #16238: [SPARK-18811] StreamSource resolution should happ...

2016-12-09 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16238#discussion_r91803533 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/util/DefaultSource.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #16238: [SPARK-18811] StreamSource resolution should happen in s...

2016-12-09 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16238 This LGTM, I was just talking with @tdas about how I think that all of this initialization stuff should be lazy and happen on the stream execution thread. I think this can simplify what

[GitHub] spark pull request #16238: [SPARK-18811] StreamSource resolution should happ...

2016-12-09 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16238#discussion_r91803089 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/util/DefaultSource.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #16182: [SPARK-18754][SS] Rename recentProgresses to recentProgr...

2016-12-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16182 /cc @tdas --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16182: [SPARK-18754][SS] Rename recentProgresses to rece...

2016-12-06 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/16182 [SPARK-18754][SS] Rename recentProgresses to recentProgress Based on an informal survey, users find this option easier to understand / remember. You can merge this pull request into a Git

[GitHub] spark pull request #16178: [SPARK-18751][Core]Fix deadlock when SparkContext...

2016-12-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16178#discussion_r91190778 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1760,25 +1760,24 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...

2016-12-05 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16138 This will be very convenient! Looking forward to the whole patch. For SQL I think you should look at [`RuntimeReplaceable`](https://github.com/apache/spark/blob

[GitHub] spark pull request #16113: [SPARK-18657][SPARK-18668] Make StreamingQuery.id...

2016-12-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16113#discussion_r90741307 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -605,34 +629,64 @@ class StreamExecution

[GitHub] spark pull request #16113: [SPARK-18657][SPARK-18668] Make StreamingQuery.id...

2016-12-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16113#discussion_r90741239 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala --- @@ -32,21 +32,33 @@ import org.apache.spark.sql.SparkSession

[GitHub] spark pull request #16113: [SPARK-18657][SPARK-18668] Make StreamingQuery.id...

2016-12-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16113#discussion_r90741067 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala --- @@ -54,6 +61,26 @@ object OffsetSeq { * `nulls

[GitHub] spark issue #15918: [SPARK-18122][SQL][WIP]Fallback to Kryo for unsupported ...

2016-12-01 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15918 I don't think you can limit the implicit. What type would pick up case classes, but not case classes that contain invalid things? I think you would need a macros for this kind of introspection

[GitHub] spark issue #16094: [SPARK-18541][Python]Add metadata parameter to pyspark.s...

2016-12-01 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16094 No worries, thanks for working on this! It's great to ensure our Python APIs aren't lagging behind the Scala ones. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #13147: [SPARK-6320][SQL] Move planLater method into Gene...

2016-12-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/13147#discussion_r90560927 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/QueryPlanner.scala --- @@ -27,6 +27,14 @@ import

[GitHub] spark issue #16094: [SPARK-18541][Python]Add metadata parameter to pyspark.s...

2016-12-01 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16094 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15918: [SPARK-18122][SQL][WIP]Fallback to Kryo for unsupported ...

2016-11-30 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15918 We should probably add a flag (maybe even off by default). The error message can tell you to turn on the flag if you are okay with the fallback. --- If your project is set up for it, you can

[GitHub] spark issue #15918: [SPARK-18122][SQL][WIP]Fallback to Kryo for unsupported ...

2016-11-30 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15918 I agree that the only change in behavior is that things that used to throw an error will now not throw an error. If done right (I haven't looked deeply at the PR itself yet), no case

[GitHub] spark pull request #13147: [SPARK-6320][SQL] Move planLater method into Gene...

2016-11-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/13147#discussion_r90317826 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/QueryPlanner.scala --- @@ -27,6 +27,14 @@ import

[1/2] spark git commit: [SPARK-18516][SQL] Split state and progress in streaming

2016-11-29 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 9a02f6821 -> c3d08e2f2 http://git-wip-us.apache.org/repos/asf/spark/blob/c3d08e2f/sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala -- diff --git

[1/2] spark git commit: [SPARK-18516][SQL] Split state and progress in streaming

2016-11-29 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-2.1 045ae299c -> 28b57c8a1 http://git-wip-us.apache.org/repos/asf/spark/blob/28b57c8a/sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala -- diff --git

[2/2] spark git commit: [SPARK-18516][SQL] Split state and progress in streaming

2016-11-29 Thread marmbrus
"3" : 0, "0" : 1 } }, "numRecords" : 3, "inputRowsPerSecond" : 230.76923076923077, "processedRowsPerSecond" : 10.869565217391303 } ] } ``` Additionally, in order to make it possible to correlate progress update

[2/2] spark git commit: [SPARK-18516][SQL] Split state and progress in streaming

2016-11-29 Thread marmbrus
"3" : 0, "0" : 1 } }, "numRecords" : 3, "inputRowsPerSecond" : 230.76923076923077, "processedRowsPerSecond" : 10.869565217391303 } ] } ``` Additionally, in order to make it possible to correlate progress updates across

[GitHub] spark issue #15954: [SPARK-18516][SQL] Split state and progress in streaming

2016-11-29 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15954 LGTM, merging to master and 2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15954: [SPARK-18516][SQL] Split state and progress in st...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90133572 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala --- @@ -59,13 +62,20 @@ class StreamingQueryManager private

[GitHub] spark pull request #15954: [SPARK-18516][SQL] Split state and progress in st...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90133503 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala --- @@ -64,23 +68,26 @@ trait StreamingQuery

spark git commit: [SPARK-18498][SQL] Revise HDFSMetadataLog API for better testing

2016-11-29 Thread marmbrus
cks without worrying about batch file name formats. marmbrus zsxwing Existing tests already ensure this API faithfully support core functionality i.e., creation of batch files. Author: Tyson Condie <tcon...@gmail.com> Closes #15924 from tcondie/SPARK-18498. Signed-off-by: Michael Armbr

spark git commit: [SPARK-18498][SQL] Revise HDFSMetadataLog API for better testing

2016-11-29 Thread marmbrus
cks without worrying about batch file name formats. marmbrus zsxwing Existing tests already ensure this API faithfully support core functionality i.e., creation of batch files. Author: Tyson Condie <tcon...@gmail.com> Closes #15924 from tcondie/SPARK-18498. Signed-off-by: Michael Armbr

[GitHub] spark pull request #15924: [SPARK-18498] [SQL] Revise HDFSMetadataLog API fo...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15924#discussion_r90090753 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -129,48 +129,18 @@ class HDFSMetadataLog[T

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90083413 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala --- @@ -38,11 +40,11 @@ trait StreamingQuery { def name

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90082045 --- Diff: python/pyspark/sql/streaming.py --- @@ -87,6 +88,24 @@ def awaitTermination(self, timeout=None): else: return self

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90084842 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala --- @@ -279,3 +287,8 @@ class StreamingQueryManager private

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90085518 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryProgress.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90084420 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala --- @@ -81,30 +83,30 @@ object StreamingQueryListener

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90085377 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryProgress.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90084970 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryProgress.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90083627 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala --- @@ -51,7 +53,7 @@ trait StreamingQuery { def sparkSession

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90086100 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala --- @@ -669,55 +658,48 @@ trait StreamTest extends QueryTest

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90084872 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala --- @@ -279,3 +287,8 @@ class StreamingQueryManager private

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r89035938 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryProgress.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r89032237 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala --- @@ -64,23 +66,26 @@ trait StreamingQuery

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r89032104 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/SourceProgress.scala --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r89032039 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryProgress.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r89031940 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/SourceProgress.scala --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #15962: [SPARK-18526][SQL][KAFKA] Allow users to configure max.p...

2016-11-21 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15962 Do you have any performance comparisons to show that we need to do this? On the driver, I think we want it to be as small as possible, because we don't want to pull any data down (it would

[GitHub] spark issue #15954: [WIP][SPARK-18516][SQL] Split state and progress in stre...

2016-11-20 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15954 /cc @tdas --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-20 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/15954 [WIP][SPARK-18516][SQL] Split state and progress in streaming This PR separates the status of a `StreamingQuery` into two separate APIs: - `status` - describes the status of a `StreamingQuery

[GitHub] spark issue #15933: [SPARK-18505][SQL] Simplify AnalyzeColumnCommand

2016-11-18 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15933 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15934: [SPARK-18497][SS]Make ForeachSink support watermark

2016-11-18 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15934 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15921: [SPARK-18493] Add missing python APIs: withWatermark and...

2016-11-17 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15921 No, I don't think we need to throw any exceptions. Watermarks are defined at batch boundaries, so it would just have no affect for a batch job. We should make sure that the batch planner

[GitHub] spark issue #15908: [SPARK-18459][SPARK-18460][StructuredStreaming] Rename t...

2016-11-16 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15908 Oh sorry... I'm realizing I'm commenting on a PR for branch-2.0. We don't have to address these comments here, but we should make sure we are happy with all the naming before 2.1 is released

<    1   2   3   4   5   6   7   8   9   10   >