[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83082993 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -516,12 +568,127 @@ class StreamExecution

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83083859 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -17,22 +17,78 @@ package

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83079156 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -278,8 +315,14 @@ class StreamExecution

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r83060899 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -86,7 +93,13 @@ case class StateStoreSaveExec

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83057793 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -414,6 +418,84 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83056944 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -414,6 +418,84 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83056792 --- Diff: sql/core/src/main/java/org/apache/spark/sql/test/JavaStringLength.java --- @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83056607 --- Diff: python/pyspark/sql/context.py --- @@ -202,6 +202,32 @@ def registerFunction(self, name, f, returnType=StringType

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83057132 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -414,6 +418,84 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876529 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -412,6 +419,63 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876442 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -412,6 +419,63 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876419 --- Diff: python/pyspark/sql/context.py --- @@ -202,6 +202,26 @@ def registerFunction(self, name, f, returnType=StringType

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876509 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -17,9 +17,15 @@ package org.apache.spark.sql

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15422 I agree that the data that was already read is probably good. I also think that this is a pretty big behavior change where there are legitimate cases (i.e. tons of data and it is fine to miss

spark git commit: [SPARK-17830] Annotate spark.sql package with InterfaceStability

2016-10-10 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 4bafacaa5 -> 689de9200 [SPARK-17830] Annotate spark.sql package with InterfaceStability ## What changes were proposed in this pull request? This patch annotates the InterfaceStability level for top level classes in o.a.spark.sql and

[GitHub] spark issue #15392: [SPARK-17830] Annotate spark.sql package with InterfaceS...

2016-10-10 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15392 LGTM, merging to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-07 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/9766 +1 to this functionality, but also to the request to add more tests and documentation. It would also to be good to comment on the idea of using SQL as a more general way to implement

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-07 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82450939 --- Diff: python/pyspark/sql/context.py --- @@ -202,6 +202,10 @@ def registerFunction(self, name, f, returnType=StringType

[GitHub] spark pull request #15354: [SPARK-17764][SQL] Add `to_json` supporting to co...

2016-10-07 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15354#discussion_r82443073 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala --- @@ -343,4 +343,23 @@ class

spark git commit: [SPARK-16411][SQL][STREAMING] Add textFile to Structured Streaming.

2016-10-07 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master aa3a6841e -> bb1aaf28e [SPARK-16411][SQL][STREAMING] Add textFile to Structured Streaming. ## What changes were proposed in this pull request? Adds the textFile API which exists in DataFrameReader and serves same purpose. ## How was this

[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

2016-10-07 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/14087 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15367: [SPARK-17346][SQL][test-maven]Add Kafka source for Struc...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15367 No, if we backport this I would plan to continue to backport changes (that are safe) until the next release. Either way this should not affect what goes into master. --- If your project is set

[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

2016-10-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r82293680 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -311,6 +311,37 @@ final class DataStreamReader private[sql

[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

2016-10-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r82293331 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -378,6 +378,24 @@ class FileStreamSourceSuite extends

[GitHub] spark issue #15367: [SPARK-17346][SQL][test-maven]Add Kafka source for Struc...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15367 We should definitly vet this PR carefully to make sure its safe. One thing that is missing from that guide, that I do believe is accepted practice, is more leeway when the feature is marked

[GitHub] spark issue #15380: Backport [SPARK-15062][SQL] fix list type infer serializ...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15380 Merged, can you close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

spark git commit: [SPARK-15062][SQL] Backport fix list type infer serializer issue

2016-10-06 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 376545e4d -> d3890deb7 [SPARK-15062][SQL] Backport fix list type infer serializer issue This backports https://github.com/apache/spark/commit/733cbaa3c0ff617a630a9d6937699db37ad2943b to Branch 1.6. It's a pretty simple patch, and

spark git commit: [SPARK-17780][SQL] Report Throwable to user in StreamExecution

2016-10-06 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-2.0 225372adf -> a2bf09588 [SPARK-17780][SQL] Report Throwable to user in StreamExecution ## What changes were proposed in this pull request? When using an incompatible source for structured streaming, it may throw NoClassDefFoundError.

spark git commit: [SPARK-17780][SQL] Report Throwable to user in StreamExecution

2016-10-06 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 79accf45a -> 9a48e60e6 [SPARK-17780][SQL] Report Throwable to user in StreamExecution ## What changes were proposed in this pull request? When using an incompatible source for structured streaming, it may throw NoClassDefFoundError. It's

[GitHub] spark issue #15352: [SPARK-17780][SQL]Report Throwable to user in StreamExec...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15352 LGTM, I'm going to merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15352: [SPARK-17780][SQL]Report Throwable to user in Str...

2016-10-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15352#discussion_r82269155 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -207,13 +207,18 @@ class StreamExecution

[GitHub] spark issue #15380: Backport [SPARK-15062][SQL] fix list type infer serializ...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15380 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15362: [SPARK-17643] Remove comparable requirement from Offset ...

2016-10-05 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15362 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81889941 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -525,8 +645,62 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81873789 --- Diff: project/MimaExcludes.scala --- @@ -53,7 +53,14 @@ object MimaExcludes { ProblemFilters.exclude[ReversedMissingMethodProblem

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81873207 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81882888 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryInfo.scala --- @@ -30,8 +30,15 @@ import

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81872358 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -100,28 +110,138 @@ class StreamingQuerySuite extends

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81872395 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -100,28 +110,138 @@ class StreamingQuerySuite extends

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81873307 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -511,12 +572,71 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81874159 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -136,16 +145,30 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81872770 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryInfo.scala --- @@ -30,8 +30,15 @@ import

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81875491 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -525,8 +645,62 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81873537 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -525,8 +645,62 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81872893 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/SourceStatus.scala --- @@ -26,9 +26,13 @@ import

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81882933 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81874295 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -105,11 +111,14 @@ class StreamExecution

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81844299 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -0,0 +1,231 @@ +--- +layout: global +title: Structured Streaming + Kafka

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81840127 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -0,0 +1,231 @@ +--- +layout: global +title: Structured Streaming + Kafka

[GitHub] spark issue #15333: [SPARK-17761][SQL] Remove MutableRow

2016-10-04 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15333 This seems like a reasonable simplification to me. A little bit of history (though this has diverged significantly, so don't take this authoritative): I think this complexity all stems from

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81834388 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -0,0 +1,231 @@ +--- +layout: global +title: Structured Streaming + Kafka

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81836478 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -530,3 +530,8 @@ object StreamExecution

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81836317 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala --- @@ -469,29 +469,49 @@ trait StreamTest extends QueryTest

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81833814 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -0,0 +1,231 @@ +--- +layout: global +title: Structured Streaming + Kafka

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81835883 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,282 @@ +/* + * Licensed

[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

2016-10-03 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r81610590 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -21,13 +21,13 @@ import scala.collection.JavaConverters

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-29 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 I spent a while playing around with this today on a real cluster, and overall it is pretty cool! I have a few suggestions we should implement in the long run, but these can probably be done

[GitHub] spark issue #15274: [SPARK-17699] Support for parsing JSON string columns

2016-09-29 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15274 @HyukjinKwon absolutely. I actually changed the name from `json_parser` to `from_json` in anticipation of adding `to_json` :) --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #15274: [SPARK-17699] Support for parsing JSON string columns

2016-09-28 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15274 Emailed the list. Seems like a popular feature so far :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #15274: [SPARK-17699] Support for parsing JSON string col...

2016-09-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15274#discussion_r80837055 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -467,3 +469,26 @@ case class JsonTuple

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-27 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 FYI: #15274 adds support for parsing JSON from the key/value into a Spark SQL `StructType` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #15274: [SPARK-17699] Support for parsing JSON string col...

2016-09-27 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/15274 [SPARK-17699] Support for parsing JSON string columns Spark SQL has great support for reading text files that contain JSON data. However, in many cases the JSON data is just one column amongst

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80599802 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80567253 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80564114 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80563908 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80568269 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80568479 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80568553 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80568624 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80564169 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80584097 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80563435 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80562234 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80567477 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80563104 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80563033 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80565318 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80568036 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80563543 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,446 @@ +/* + * Licensed

spark git commit: [SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing

2016-09-26 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master bde85f8b7 -> 8135e0e5e [SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing ## What changes were proposed in this pull request? When reading file stream with non-globbing path, the results

[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...

2016-09-26 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/14803 Thanks, I'm going to merge this to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15195: [SPARK-17632][SQL]make console sink and other sinks work...

2016-09-26 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15195 I'm sorry, I don't understand the goal of this patch. Recovering from a checkpoint only makes sense if your sink is stateful. What is the use case you are trying to support? --- If your

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 > "I want to be able to add a topicpartition mid stream, but I don't want to start it from the beginning." I see, I was thinking only of new topics that appear that match your

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 Comparable requirement removed in #15207. > I think in the absence of prior information about the position in a topicpartition, you start a new batch on topic B starting from where

[GitHub] spark pull request #15207: [SPARK-17643] Remove comparable requirement from ...

2016-09-22 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/15207 [SPARK-17643] Remove comparable requirement from Offset For some sources, it is difficult to provide a global ordering based only on the data in the offset. Since we don't use comparison

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 For streaming you already know what the global order is, because you know when you asked for A and B. I agree that we should probably remove the comparable requirement from `Offset` in favor

[GitHub] spark issue #15197: [SPARK-17631] [SQL] Add HttpStreamSink for structured st...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15197 Thanks for working on this, it does seem like it could be useful. I'm not sure if this should go into Spark or into a separate package. It really depends on how many people want this feature

[GitHub] spark issue #15201: [SPARK-17638][Streaming]Stop JVM StreamingContext when t...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15201 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/14803 Mostly looks good, I've also asked @tdas to take a look since he wrote this initially. A few more cases came to mind while while I was rephrasing your documentation. Specifically

[GitHub] spark pull request #14803: [SPARK-17153][SQL] Should read partition data whe...

2016-09-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14803#discussion_r80120376 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -512,6 +512,10 @@ csvDF = spark \ These examples generate streaming DataFrames

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-21 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 I asked @koeninger to clarify the specific suggestions he is referring to above, here's my response: > [Comments here and on JIRA relating to concerns with the `Offset` implementat

[GitHub] spark issue #15188: [SPARK-17627] Mark Streaming Providers Experimental

2016-09-21 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15188 /cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15188: [SPARK-17627] Mark Streaming Providers Experiment...

2016-09-21 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/15188 [SPARK-17627] Mark Streaming Providers Experimental All of structured streaming is experimental in its first release. We missed the annotation on two of the APIs. You can merge this pull

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-20 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 > Either way, who are you to presume that a user doesn't know what she is doing when she configured a consumer to start at a particular position for an added partition? I feel l

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-20 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r79690215 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,446 @@ +/* + * Licensed

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-20 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 > My bigger concern is that it looks like you guys are continuing to hack in a particular direction, without addressing my points or answering whether you're willing to let me help w

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-20 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r79676581 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,446 @@ +/* + * Licensed

[GitHub] spark pull request #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source...

2016-09-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r79493142 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Source.scala --- @@ -30,16 +30,37 @@ trait Source { /** Returns

[GitHub] spark pull request #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source...

2016-09-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r79498410 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Source.scala --- @@ -30,16 +30,37 @@ trait Source { /** Returns

<    1   2   3   4   5   6   7   8   9   10   >