[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199286570 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingLimitExec.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199286349 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingLimitExec.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199279042 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingLimitExec.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199280992 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala --- @@ -615,7 +615,7 @@ class StreamSuite extends StreamTest

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199278369 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -72,6 +72,8 @@ abstract class SparkStrategies extends

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199278862 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingLimitExec.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199279216 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingLimitExec.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199278041 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -315,8 +315,10 @@ object

[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...

2018-02-23 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/20647#discussion_r170185948 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala --- @@ -77,31 +79,32 @@ class MicroBatchExecution

[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...

2018-02-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/20647#discussion_r170115965 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala --- @@ -415,12 +418,14 @@ class MicroBatchExecution

[GitHub] spark issue #20598: [SPARK-23406] [SS] Enable stream-stream self-joins

2018-02-21 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/20598 Yeah, this seems risky at RC5. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...

2018-02-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/20647#discussion_r169799465 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -107,17 +106,24 @@ case class

[GitHub] spark issue #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3

2018-02-14 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/20511 Unfortunately, dependency changes are not typically allowed in patch releases. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20598: [SPARK-23406] [SS] Enable stream-stream self-join...

2018-02-13 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/20598#discussion_r168011671 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala --- @@ -431,7 +431,11 @@ class MicroBatchExecution

[GitHub] spark pull request #20598: [SPARK-23406] [SS] Enable stream-stream self-join...

2018-02-13 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/20598#discussion_r168011940 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala --- @@ -440,8 +444,6 @@ class MicroBatchExecution

[GitHub] spark issue #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3

2018-02-13 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/20511 Sorry if I'm missing some context here, but our typical process this late in the release (we are over a month since the branch was cut) would be to disable any new features that still have

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-02 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/20387 Regarding, `computeStats`, the logical plan seems like it might not be the right place. As we move towards more CBO it seems like we are going to need to pick physical operators before we can

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/20387#discussion_r165751660 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,15 +17,149 @@ package

[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

2018-01-14 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/20085 This blocks better support for encoders on spark-avro, and seems safe, so I'd really like to include it in possible

[GitHub] spark issue #20219: [SPARK-23025][SQL] Support Null type in scala reflection

2018-01-11 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/20219 This seems reasonable to me. You can already do `sql("SELECT null")` --- - To unsubscribe, e-mail: review

[GitHub] spark issue #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over Struct...

2018-01-03 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/20010 /cc @cloud-fan @sameeragarwal --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20085: [SPARK-22739][Catalyst][WIP] Additional Expression Suppo...

2018-01-03 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/20085 /cc @cloud-fan @sameeragarwal --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16578 I agree that this PR needs to be allocated more review bandwidth, and it is unfortunate that it has been blocked on that. However, I am -1 on merging a change this large after branch cut

[GitHub] spark pull request #20048: [SPARK-22862] Docs on lazy elimination of columns...

2017-12-21 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/20048 [SPARK-22862] Docs on lazy elimination of columns missing from an encoder This behavior has confused some users, so lets clarify it. You can merge this pull request into a Git repository

[GitHub] spark issue #19925: [SPARK-22732] Add Structured Streaming APIs to DataSourc...

2017-12-14 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/19925 I would probably do ... streaming.reader/writer if we are going to namespace it. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #19771: [SPARK-22544][SS]FileStreamSource should use its own had...

2017-11-17 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/19771 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19461: [SPARK-22230] Swap per-row order in state store restore.

2017-10-09 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/19461 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19314: [SPARK-22094][SS]processAllAvailable should check the qu...

2017-09-21 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/19314 LGTM! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19240: [SPARK-22018][SQL]Preserve top-level alias metadata when...

2017-09-14 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/19240 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19240: [SPARK-22018][SQL]Preserve top-level alias metada...

2017-09-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/19240#discussion_r139042816 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2256,7 +2256,10 @@ object CleanupAliases extends Rule

[GitHub] spark pull request #19080: [SPARK-21865][SQL] simplify the distribution sema...

2017-08-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/19080#discussion_r136213945 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -30,18 +30,43 @@ import

[GitHub] spark pull request #19080: [SPARK-21865][SQL] simplify the distribution sema...

2017-08-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/19080#discussion_r136213879 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -30,18 +30,43 @@ import

[GitHub] spark pull request #19080: [SPARK-21865][SQL] simplify the distribution sema...

2017-08-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/19080#discussion_r136209980 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -30,18 +30,43 @@ import

[GitHub] spark pull request #18923: [SPARK-21710][StSt] Fix OOM on ConsoleSink with l...

2017-08-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18923#discussion_r132801760 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala --- @@ -49,7 +49,7 @@ class ConsoleSink(options: Map[String

[GitHub] spark pull request #18923: [SPARK-21710][StSt] Fix OOM on ConsoleSink with l...

2017-08-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18923#discussion_r132792708 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala --- @@ -49,7 +49,7 @@ class ConsoleSink(options: Map[String

[GitHub] spark pull request #18925: [SPARK-21713][SC] Replace streaming bit with Outp...

2017-08-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18925#discussion_r132791950 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -779,10 +780,16 @@ case object

[GitHub] spark pull request #18925: [SPARK-21713][SC] Replace streaming bit with Outp...

2017-08-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18925#discussion_r132791711 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1185,7 +1186,7 @@ object

[GitHub] spark issue #18925: [SPARK-21713][SC] Replace streaming bit with OutputMode

2017-08-11 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18925 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18925: [SPARK-21713][SC] Replace streaming bit with OutputMode

2017-08-11 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18925 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18828: [SPARK-21619][SQL] Fail the execution of canonica...

2017-08-03 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18828#discussion_r131250896 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -181,17 +181,38 @@ abstract class QueryPlan[PlanType

[GitHub] spark issue #18822: [SPARK-21546][SS] dropDuplicates should ignore watermark...

2017-08-02 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18822 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18803: [SPARK-21597][SS]Fix a potential overflow issue in Event...

2017-08-01 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18803 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18780: [INTRA] Close stale PRs

2017-07-31 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18780 I'm in favor of fairly aggressive closing of inactive pull requests. The cost of reopening them is small, and the benefits of having a clear view of what is in progress are large. I think as long

[GitHub] spark issue #18658: [SPARK-20871][SQL] only log Janino code at debug level

2017-07-17 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18658 That seems reasonable. I'm kind of pro-truncation for very very large code. Even though its not great to have something truncated, outputting GBs of logs is also pretty bad for downstream

[GitHub] spark issue #18658: [SPARK-20871][SQL] only log Janino code at debug level

2017-07-17 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18658 I don't have super strong opinions here, but in my experience its not always easy to get users to rerun a failed query with a different logging level. Have we considered truncating or special

[GitHub] spark issue #18638: [SPARK-21421][SS]Add the query id as a local property to...

2017-07-14 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18638 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18426: [SPARK-21216][SS] Hive strategies missed in Struc...

2017-06-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18426#discussion_r124127512 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala --- @@ -47,11 +47,19 @@ class IncrementalExecution

[GitHub] spark pull request #18364: [SPARK-21153] Use project instead of expand in tu...

2017-06-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18364#discussion_r123638453 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala --- @@ -152,17 +152,21 @@ object TimeWindow

[GitHub] spark pull request #18364: [SPARK-21153] Use project instead of expand in tu...

2017-06-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18364#discussion_r123588018 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2562,37 +2563,63 @@ object TimeWindowing extends Rule

[GitHub] spark pull request #18364: [SPARK-21153] Use project instead of expand in tu...

2017-06-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18364#discussion_r123589167 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2562,37 +2563,63 @@ object TimeWindowing extends Rule

[GitHub] spark pull request #18364: [SPARK-21153] Use project instead of expand in tu...

2017-06-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18364#discussion_r123588431 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2562,37 +2563,63 @@ object TimeWindowing extends Rule

[GitHub] spark pull request #18364: [SPARK-21153] Use project instead of expand in tu...

2017-06-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18364#discussion_r123588323 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2562,37 +2563,63 @@ object TimeWindowing extends Rule

[GitHub] spark pull request #18364: [SPARK-21153] Use project instead of expand in tu...

2017-06-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18364#discussion_r123594063 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala --- @@ -152,17 +152,21 @@ object TimeWindow

[GitHub] spark pull request #18364: [SPARK-21153] Use project instead of expand in tu...

2017-06-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18364#discussion_r123587631 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2562,37 +2563,63 @@ object TimeWindowing extends Rule

[GitHub] spark pull request #18364: [SPARK-21153] Use project instead of expand in tu...

2017-06-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18364#discussion_r123587244 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2562,37 +2563,63 @@ object TimeWindowing extends Rule

[GitHub] spark issue #18381: [SPARK-21167][SS]Decode the path generated by File sink ...

2017-06-21 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18381 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18306: [SPARK-21029][SS] All StreamingQuery should be stopped w...

2017-06-14 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18306 Seems okay to me. /cc @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15306: [SPARK-17740] Spark tests should mock / interpose HDFS t...

2017-06-12 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15306 This is good thing for us to test. However, If you throw an exception in your test that opens a file, it seems that it swallows the exception just telling you that you are leaking files

[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...

2017-06-05 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18207 /cc @joshrosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17371: [SPARK-19903][PYSPARK][SS] window operator miss the `wat...

2017-06-02 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17371 Can we add an analysis rule that just pulls up missing metadata from attributes in the child? It could run once after other rules. --- If your project is set up for it, you can reply

[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs

2017-05-30 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18104 Ping? I'd like to cut the next RC. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-30 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17770 Whoa, I do not think we should back porting a large change to the inner workings of the analyzer. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #18119: [BACKPORT-2.2][SPARK-19372][SQL] Fix throwing a Java exc...

2017-05-26 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18119 Since this is purely a fallback for a case that would error, it seems possibly okay to include in 2.2. @zsxwing could you take a look and make sure I'm right? --- If your project is set up

[GitHub] spark issue #18065: [SPARK-20844] Remove experimental from Structured Stream...

2017-05-24 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18065 Good catch on python! Fixed but please help me make sure I didn't miss anything. Leaving all the GroupState stuff experimental was on purpose, since this will be the first release

[GitHub] spark pull request #18065: [SPARK-20844] Remove experimental from Structured...

2017-05-24 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18065#discussion_r118390750 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2800,8 +2800,6 @@ object functions { * @group datetime_funcs

[GitHub] spark pull request #18065: [SPARK-20844] Remove experimental from Structured...

2017-05-24 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/18065#discussion_r118390436 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -10,7 +10,7 @@ title: Structured Streaming Programming Guide # Overview

[GitHub] spark pull request #17308: [SPARK-19968][SPARK-20737][SS] Use a cached insta...

2017-05-24 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17308#discussion_r118389082 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaProducer.scala --- @@ -0,0 +1,174 @@ +/* + * Licensed

[GitHub] spark pull request #17308: [SPARK-19968][SPARK-20737][SS] Use a cached insta...

2017-05-24 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17308#discussion_r118389504 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaProducer.scala --- @@ -0,0 +1,174 @@ +/* + * Licensed

[GitHub] spark pull request #17308: [SPARK-19968][SPARK-20737][SS] Use a cached insta...

2017-05-24 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17308#discussion_r118387956 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaProducer.scala --- @@ -0,0 +1,174 @@ +/* + * Licensed

[GitHub] spark issue #18065: [SPARK-20844] Remove experimental from Structured Stream...

2017-05-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18065 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18065: [SPARK-20844] Remove experimental from Structured...

2017-05-22 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/18065 [SPARK-20844] Remove experimental from Structured Streaming APIs Now that Structured Streaming has been out for several Spark release and has large production use cases, the `Experimental` label

[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...

2017-05-12 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17957 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17958: [SPARK-20716][SS] StateStore.abort() should not t...

2017-05-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17958#discussion_r116288688 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -202,13 +203,22 @@ private

[GitHub] spark pull request #17958: [SPARK-20716][SS] StateStore.abort() should not t...

2017-05-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17958#discussion_r116283013 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -202,13 +203,22 @@ private

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-08 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17770 It might be a good idea to look at [the time spent](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L37) before

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17770 Okay, I can be convinced. Lets do eliminate the other path then though and make sure there are no performance regressions. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #17308: [SPARK-19968][SS] Use a cached instance of `Kafka...

2017-05-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17308#discussion_r114841425 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaProducer.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed

[GitHub] spark pull request #17308: [SPARK-19968][SS] Use a cached instance of `Kafka...

2017-05-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17308#discussion_r11484 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaProducer.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed

[GitHub] spark pull request #17308: [SPARK-19968][SS] Use a cached instance of `Kafka...

2017-05-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17308#discussion_r114841245 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala --- @@ -30,14 +30,19 @@ private[kafka010] class KafkaSink

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-03 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17770 Some thoughts: - We shouldn't have multiple optimizations for avoiding repeated analysis. So if we decide to go this way then we should get rid of `resolveOperators`. - I agree

[GitHub] spark pull request #17838: [SPARK-20567] Lazily bind in GenerateExec

2017-05-02 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/17838 [SPARK-20567] Lazily bind in GenerateExec It is not valid to eagerly bind with the child's output as this causes failures when we attempt to canonicalize the plan (replacing the attribute

[GitHub] spark pull request #17765: [SPARK-20464][SS] Add a job group and description...

2017-04-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17765#discussion_r113827924 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -825,6 +832,11 @@ class StreamExecution

[GitHub] spark pull request #17765: [SPARK-20464][SS] Add a job group and description...

2017-04-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17765#discussion_r113593037 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -252,6 +252,7 @@ class StreamExecution

[GitHub] spark pull request #17765: [SPARK-20464][SS] Add a job group and description...

2017-04-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17765#discussion_r113560170 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -825,6 +833,11 @@ class StreamExecution

[GitHub] spark pull request #17765: [SPARK-20464][SS] Add a job group and description...

2017-04-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17765#discussion_r113560096 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -252,6 +252,7 @@ class StreamExecution

[GitHub] spark issue #17594: [SPARK-20282][SS][Tests]Write the commit log first to fi...

2017-04-10 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17594 LGTM, for fixing the issue with the test. We should separately decide if this is really the behavior we want for the commit log. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #17594: [SPARK-20282][SS][Tests]Write the commit log firs...

2017-04-10 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17594#discussion_r110735241 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -304,8 +304,8 @@ class StreamExecution

[GitHub] spark pull request #17488: [SPARK-20165][SS] Resolve state encoder's deseria...

2017-03-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17488#discussion_r109070462 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala --- @@ -490,6 +490,18 @@ trait StreamTest extends QueryTest

[GitHub] spark pull request #17488: [SPARK-20165][SS] Resolve state encoder's deseria...

2017-03-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17488#discussion_r109069839 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala --- @@ -68,6 +68,17 @@ case class

[GitHub] spark pull request #17398: [SPARK-19716][SQL] support by-name resolution for...

2017-03-24 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17398#discussion_r107985278 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala --- @@ -62,6 +66,54 @@ class

[GitHub] spark issue #17252: [SPARK-19913][SS] Log warning rather than throw Analysis...

2017-03-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17252 Thanks for working on this, but I think this is inconsistent with other APIs in Spark. Also for things like the foreach sink, you might actually be expecting the option to affect the partitioning

[GitHub] spark issue #17361: [SPARK-20030][SS] Event-time-based timeout for MapGroups...

2017-03-21 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17361 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307806 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/streaming/KeyedStateTimeout.java --- @@ -34,9 +32,20 @@ @InterfaceStability.Evolving

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307722 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -147,49 +147,68 @@ object

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307617 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala --- @@ -519,6 +588,52 @@ class

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307367 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/KeyedStateImpl.scala --- @@ -17,37 +17,45 @@ package

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107304893 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala --- @@ -519,6 +588,52 @@ class

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107304618 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -147,49 +147,68 @@ object

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107304531 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -147,49 +147,68 @@ object

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107304196 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -147,49 +147,68 @@ object

  1   2   3   4   5   6   7   8   9   10   >