[05/51] [partial] spark-website git commit: Add Spark 2.1.1 docs

2017-05-02 Thread marmbrus
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d4f0c34a/site/docs/2.1.1/api/java/org/apache/spark/JobExecutionStatus.html -- diff --git a/site/docs/2.1.1/api/java/org/apache/spark/JobExecutionStatus.html

[08/51] [partial] spark-website git commit: Add Spark 2.1.1 docs

2017-05-02 Thread marmbrus
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d4f0c34a/site/docs/2.1.1/api/java/org/apache/spark/ComplexFutureAction.html -- diff --git a/site/docs/2.1.1/api/java/org/apache/spark/ComplexFutureAction.html

[30/51] [partial] spark-website git commit: Add Spark 2.1.1 docs

2017-05-02 Thread marmbrus
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d4f0c34a/site/docs/2.1.1/api/R/randomSplit.html -- diff --git a/site/docs/2.1.1/api/R/randomSplit.html b/site/docs/2.1.1/api/R/randomSplit.html new file mode 100644 index

[3/4] spark-website git commit: Add Spark 2.1.1 release.

2017-05-02 Thread marmbrus
http://git-wip-us.apache.org/repos/asf/spark-website/blob/e4019e64/site/news/spark-1-4-1-released.html -- diff --git a/site/news/spark-1-4-1-released.html b/site/news/spark-1-4-1-released.html index d4327a4..faf7639 100644 ---

[2/4] spark-website git commit: Add Spark 2.1.1 release.

2017-05-02 Thread marmbrus
http://git-wip-us.apache.org/repos/asf/spark-website/blob/e4019e64/site/release-process.html -- diff --git a/site/release-process.html b/site/release-process.html index 4dded93..7782ab0 100644 --- a/site/release-process.html +++

[4/4] spark-website git commit: Add Spark 2.1.1 release.

2017-05-02 Thread marmbrus
Add Spark 2.1.1 release. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/e4019e64 Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/e4019e64 Diff:

[1/4] spark-website git commit: Add Spark 2.1.1 release.

2017-05-02 Thread marmbrus
Repository: spark-website Updated Branches: refs/heads/asf-site 09046892b -> e4019e64c http://git-wip-us.apache.org/repos/asf/spark-website/blob/e4019e64/site/sitemap.xml -- diff --git a/site/sitemap.xml b/site/sitemap.xml

[spark] Git Push Summary

2017-05-01 Thread marmbrus
Repository: spark Updated Tags: refs/tags/v2.1.1-rc3 [deleted] 2ed19cff2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2017-05-01 Thread marmbrus
Repository: spark Updated Tags: refs/tags/v2.1.1-rc2 [deleted] 02b165dcc - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2017-05-01 Thread marmbrus
Repository: spark Updated Tags: refs/tags/v2.1.1-rc4 [deleted] 267aca5bd - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2017-05-01 Thread marmbrus
Repository: spark Updated Tags: refs/tags/v2.1.1-rc1 [deleted] 30abb95c9 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2017-05-01 Thread marmbrus
Repository: spark Updated Tags: refs/tags/v2.1.1 [created] 267aca5bd - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r19436 - /dev/spark/spark-2.1.1-rc4/

2017-05-01 Thread marmbrus
Author: marmbrus Date: Tue May 2 01:05:29 2017 New Revision: 19436 Log: Add spark-2.1.1-rc4 Added: dev/spark/spark-2.1.1-rc4/ dev/spark/spark-2.1.1-rc4/SparkR_2.1.1.tar.gz (with props) dev/spark/spark-2.1.1-rc4/SparkR_2.1.1.tar.gz.asc dev/spark/spark-2.1.1-rc4/SparkR_2.1.1

svn commit: r19437 - /dev/spark/spark-2.1.1-rc4/ /release/spark/spark-2.1.1/

2017-05-01 Thread marmbrus
Author: marmbrus Date: Tue May 2 01:06:55 2017 New Revision: 19437 Log: Release Spark 2.1.1 Added: release/spark/spark-2.1.1/ - copied from r19436, dev/spark/spark-2.1.1-rc4/ Removed: dev/spark/spark-2.1.1-rc4

[GitHub] spark pull request #17765: [SPARK-20464][SS] Add a job group and description...

2017-04-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17765#discussion_r113827924 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -825,6 +832,11 @@ class StreamExecution

[GitHub] spark pull request #17765: [SPARK-20464][SS] Add a job group and description...

2017-04-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17765#discussion_r113593037 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -252,6 +252,7 @@ class StreamExecution

[GitHub] spark pull request #17765: [SPARK-20464][SS] Add a job group and description...

2017-04-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17765#discussion_r113560170 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -825,6 +833,11 @@ class StreamExecution

[GitHub] spark pull request #17765: [SPARK-20464][SS] Add a job group and description...

2017-04-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17765#discussion_r113560096 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -252,6 +252,7 @@ class StreamExecution

[GitHub] spark issue #17594: [SPARK-20282][SS][Tests]Write the commit log first to fi...

2017-04-10 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17594 LGTM, for fixing the issue with the test. We should separately decide if this is really the behavior we want for the commit log. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #17594: [SPARK-20282][SS][Tests]Write the commit log firs...

2017-04-10 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17594#discussion_r110735241 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -304,8 +304,8 @@ class StreamExecution

[GitHub] spark pull request #17488: [SPARK-20165][SS] Resolve state encoder's deseria...

2017-03-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17488#discussion_r109070462 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala --- @@ -490,6 +490,18 @@ trait StreamTest extends QueryTest

[GitHub] spark pull request #17488: [SPARK-20165][SS] Resolve state encoder's deseria...

2017-03-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17488#discussion_r109069839 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala --- @@ -68,6 +68,17 @@ case class

[GitHub] spark pull request #17398: [SPARK-19716][SQL] support by-name resolution for...

2017-03-24 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17398#discussion_r107985278 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala --- @@ -62,6 +66,54 @@ class

[GitHub] spark issue #17252: [SPARK-19913][SS] Log warning rather than throw Analysis...

2017-03-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17252 Thanks for working on this, but I think this is inconsistent with other APIs in Spark. Also for things like the foreach sink, you might actually be expecting the option to affect the partitioning

[GitHub] spark issue #17361: [SPARK-20030][SS] Event-time-based timeout for MapGroups...

2017-03-21 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17361 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307806 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/streaming/KeyedStateTimeout.java --- @@ -34,9 +32,20 @@ @InterfaceStability.Evolving

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307722 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -147,49 +147,68 @@ object

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307617 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala --- @@ -519,6 +588,52 @@ class

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307367 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/KeyedStateImpl.scala --- @@ -17,37 +17,45 @@ package

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107304893 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala --- @@ -519,6 +588,52 @@ class

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107304618 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -147,49 +147,68 @@ object

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107304531 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -147,49 +147,68 @@ object

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107304196 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -147,49 +147,68 @@ object

[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...

2017-03-21 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107304133 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/streaming/KeyedStateTimeout.java --- @@ -34,9 +32,20 @@ @InterfaceStability.Evolving

[GitHub] spark issue #17371: [SPARK-19903][PYSPARK][SS] window operator miss the `wat...

2017-03-21 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17371 I don't think that will solve the problem though. You will just get a different error message. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #17371: [SPARK-19903][PYSPARK][SS] window operator miss the `wat...

2017-03-21 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17371 I really think the core problem here is that we allow you to use resolved attributes at all in the user API. Unfortunately we are somewhat stuck with that bad decision. Personally, I never use

[GitHub] spark issue #17268: [SPARK-19932][SS] Disallow a case that might cause OOM f...

2017-03-16 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17268 Sorry I'm still not sure if this is a good idea. Why disallow the following, ```scala spark .readStream .withWatermark("eventTime", &

[GitHub] spark issue #17268: [SPARK-19932][SS] Also save event time into StateStore f...

2017-03-15 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17268 Sorry, I wasn't suggestion we mandate this. There may be use cases where users are okay deduping a short lived stream w/o a watermark. I'm only saying the timestamp is mandatory

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17179#discussion_r105991219 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/KeyedState.scala --- @@ -61,25 +65,50 @@ import

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17179#discussion_r105990971 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/KeyedState.scala --- @@ -61,25 +65,50 @@ import

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17179#discussion_r105990594 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -249,6 +250,43 @@ class KeyValueGroupedDataset[K, V] private

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-13 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17179#discussion_r105822080 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/KeyedState.scala --- @@ -61,25 +65,50 @@ import

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-13 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17179#discussion_r105821698 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -298,12 +368,14 @@ class KeyValueGroupedDataset[K, V] private

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-13 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17179#discussion_r105823059 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala --- @@ -0,0 +1,270

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-13 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17179#discussion_r105822317 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/KeyedState.scala --- @@ -61,25 +65,50 @@ import

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-13 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17179#discussion_r105822109 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/KeyedState.scala --- @@ -61,25 +65,50 @@ import

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-13 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17179#discussion_r105821496 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -249,6 +250,43 @@ class KeyValueGroupedDataset[K, V] private

[GitHub] spark issue #17268: [SPARK-19932][SS] Also save event time into StateStore f...

2017-03-13 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17268 Say the eventtime column chosen is the time of delivery into something like Kafka. Due to retries we end up with two events with different timestamps. Consider the following stream

[GitHub] spark issue #17268: [SPARK-19932][SS] Also save event time into StateStore f...

2017-03-13 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17268 I'm mixed if we want this to happen implicitly. Here's how I think about the tradeoffs for this change: On the pro side, with this change we avoid the case where the user forgets to include

[GitHub] spark issue #17228: [SPARK-19886] Fix reportDataLoss if statement in SS Kafk...

2017-03-09 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17228 LGTM too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17087: [SPARK-19372][SQL] Fix throwing a Java exception at df.f...

2017-03-09 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17087 There appears to have been some code drift (as `GeneratePredicate` and `InterpretedPredicate` both used to return a class that inherited from a common interface), but I don't think its hard

[GitHub] spark pull request #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-08 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17219#discussion_r105061613 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetCommitLog.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed

[GitHub] spark pull request #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-08 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17219#discussion_r105062302 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/Trigger.scala --- @@ -38,6 +38,26 @@ sealed trait Trigger

[GitHub] spark pull request #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-08 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17219#discussion_r105062818 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -377,17 +385,25 @@ class StreamExecution

[GitHub] spark pull request #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-08 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17219#discussion_r105062498 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -284,6 +291,7 @@ class StreamExecution

[GitHub] spark pull request #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-08 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17219#discussion_r105061689 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetCommitLog.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed

[GitHub] spark pull request #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-08 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17219#discussion_r105062343 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetCommitLog.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed

[GitHub] spark issue #17153: [SPARK-19813] maxFilesPerTrigger combo latestFirst may m...

2017-03-08 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17153 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17087: [SPARK-19372][SQL] Fix throwing a Java exception at df.f...

2017-03-08 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17087 I don't think we need a complex refactoring. Why can't `newPredicate` catch the exception, log a warning and return an interpreted `Predicate`? --- If your project is set up for it, you can

[GitHub] spark issue #17201: [SPARK-18055][SQL] Use correct mirror in ExpresionEncode...

2017-03-07 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17201 /cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17201: [SPARK-18055][SQL] Use correct mirror in Expresio...

2017-03-07 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/17201 [SPARK-18055][SQL] Use correct mirror in ExpresionEncoder Previously, we were using the mirror of passed in `TypeTag` when reflecting to build an encoder. This fails when the outer class

[GitHub] spark issue #17183: [SPARK-19841][SS]watermarkPredicate should filter based ...

2017-03-07 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17183 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17199: [SPARK-19859][SS]The new watermark should override the o...

2017-03-07 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17199 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17183: [SPARK-19841][SS]watermarkPredicate should filter...

2017-03-07 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17183#discussion_r104806617 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala --- @@ -361,7 +361,7 @@ case class

[GitHub] spark pull request #17087: [SPARK-19372][SQL] Fix throwing a Java exception ...

2017-03-07 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17087#discussion_r104790500 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -213,12 +217,30 @@ case class FilterExec(condition

[GitHub] spark issue #17087: [SPARK-19372][SQL] Fix throwing a Java exception at df.f...

2017-03-07 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17087 I agree with the general approach of having a fallback from code generation to interpreted evaluation, but I also agree that this feels too narrowly targeted. In particular, why do this in one

[GitHub] spark issue #16981: [SPARK-19637][SQL] Add to_json in FunctionRegistry

2017-03-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16981 yeah, LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17044: [SPARK-19718][SS]Handle more interrupt cases properly fo...

2017-03-03 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17044 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17044: [SPARK-19718][SS]Handle more interrupt cases prop...

2017-03-03 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17044#discussion_r104258607 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -709,12 +717,13 @@ class StreamExecution

[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

2017-03-03 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16929#discussion_r104253528 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -480,23 +480,45 @@ case class JsonTuple

[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

2017-03-03 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16929#discussion_r104253484 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -480,23 +480,45 @@ case class JsonTuple

[GitHub] spark issue #17120: [SPARK-19715][Structured Streaming] Option to Strip Path...

2017-03-03 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17120 Note streams can be very long running, so this isn't about some short window. It could even be that I'm moving to a different bucket (but don't want to loose my exactly once guarantees of a very

[GitHub] spark issue #17120: [SPARK-19715][Structured Streaming] Option to Strip Path...

2017-03-03 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17120 The use case here is when you have truly unique filenames (i.e. they contain a guid). This is actually pretty common in my experience. We definitely shouldn't turn this on by default

[GitHub] spark issue #17070: [SPARK-19721][SS] Good error message for version mismatc...

2017-02-27 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17070 /cc @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

2017-02-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16929#discussion_r103337028 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2969,11 +2969,27 @@ object functions

[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

2017-02-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16929#discussion_r103302035 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -480,36 +480,79 @@ case class JsonTuple

[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

2017-02-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16929#discussion_r103300622 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2969,11 +2969,27 @@ object functions

[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

2017-02-24 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16929 Hmm, I'm not sure we want to change this to a generator. I think that has performance consequences as well as possibly being surprising. I would probably make it possible to handle arrays (when

[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

2017-02-24 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16929 /cc @brkyvz --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-23 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16987 I spoke too soon, sorry! Thinking about it more the deterministic filename solution is not great as the number of partitions could change for several reasons. Given that would you mind

[GitHub] spark issue #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-21 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16987 Thanks for working on this, however I'm not sure if we want to go with this approach. In Spark 2.2, I think we should consider deprecating the manifest files and instead use deterministic file

[GitHub] spark pull request #16970: [SPARK-19497][SS]Implement streaming deduplicatio...

2017-02-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16970#discussion_r101862301 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2006,15 +2006,19 @@ class Dataset[T] private[sql

[GitHub] spark pull request #16970: [SPARK-19497][SS]Implement streaming deduplicatio...

2017-02-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16970#discussion_r101834289 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -35,6 +35,9 @@ object

[GitHub] spark issue #16929: [SPARK-19595][SQL] Do not allow json array in from_json

2017-02-14 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16929 I agree that its wrong to truncate, but why not just fix handling of arrays rather than disallow it? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #15918: [SPARK-18122][SQL][WIP]Fallback to Kryo for unsupported ...

2017-02-13 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15918 @windpiger, were you still working on this? I think it would be a useful feature if we can get the tests to pass. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r99255963 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyedState.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-02-01 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16664 I think @sameeragarwal plans to review. I glanced and it looks fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98802935 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/StateImpl.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98802826 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/StateImpl.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98790560 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/StateImpl.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98787267 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/StateImpl.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98778359 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -313,6 +313,25 @@ abstract class SparkStrategies extends

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98776316 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -219,6 +219,160 @@ class KeyValueGroupedDataset[K, V] private

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98774221 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -219,6 +219,160 @@ class KeyValueGroupedDataset[K, V] private

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98779817 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -235,3 +240,86 @@ case class StateStoreSaveExec

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98778114 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/State.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98779663 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -235,3 +240,86 @@ case class StateStoreSaveExec

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98775275 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -219,6 +219,160 @@ class KeyValueGroupedDataset[K, V] private

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98778548 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/StateImpl.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-01-31 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16758#discussion_r98779439 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -235,3 +240,86 @@ case class StateStoreSaveExec

<    1   2   3   4   5   6   7   8   9   10   >