[GitHub] spark issue #23156: [SPARK-24063][SS] Add maximum epoch queue threshold for ...

2018-12-06 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/23156 @gaborgsomogyi what I meant was rather than exposing a config to control the internal queue sizes, we could have a higher level config like the max pending epochs. This would act as a back

[GitHub] spark issue #23156: [SPARK-24063][SS] Add maximum epoch queue threshold for ...

2018-12-05 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/23156 Rather than controlling the queue sizes it would be better to limit the max epoch backlog and fail the query once that threshold is reached. There already seems to be patch that attempted

[GitHub] spark issue #22598: [SPARK-25501][SS] Add kafka delegation token support.

2018-11-06 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/22598 +1, LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...

2018-10-26 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22598#discussion_r228681974 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala --- @@ -66,7 +66,8 @@ private[spark] class

[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...

2018-10-26 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22598#discussion_r228671263 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala --- @@ -66,7 +66,8 @@ private[spark] class

[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...

2018-10-26 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/22824 I mean its easy to miss if a new "case" is added and "update" mode is not supported. Even now how about LeftSemi, Lef

[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...

2018-10-26 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22598#discussion_r228583252 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/KafkaDelegationTokenProvider.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed

[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...

2018-10-25 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22598#discussion_r228339774 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/KafkaDelegationTokenProvider.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed

[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...

2018-10-25 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22598#discussion_r228321793 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala --- @@ -66,7 +66,8 @@ private[spark] class

[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...

2018-10-25 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22598#discussion_r228320944 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -647,4 +647,42 @@ package object config { .stringConf

[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...

2018-10-25 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/22824 It may be enough to do the check just once than repeating similar checks for inner, leftOuter and rightOuter. For example have a single check before the `joinType match {` clause

[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-09-20 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/22482 +1 for the idea to provide native session window support. On the approach, it would be ideal if all windowing aggregations can be handled via single plan and state store (v/s

[GitHub] spark pull request #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics...

2018-09-05 Thread arunmahadevan
Github user arunmahadevan closed the pull request at: https://github.com/apache/spark/pull/22299 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-31 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 >It seems like its life cycle should be bound to an epoch, but unfortunately we don't have such an interface in continuous streaming to represent an epoch. Is it possible that we may end

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-31 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 I created a follow up PR to move CustomMetrics (and a few other streaming specific interfaces in that package) to 'streaming' and mark the interfaces as Unstable here - https://github.com

[GitHub] spark pull request #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics...

2018-08-31 Thread arunmahadevan
GitHub user arunmahadevan opened a pull request: https://github.com/apache/spark/pull/22299 [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Unstable APIs - Mark custom metrics related APIs as unstable - Move CustomMetrics (and a few other streaming interfaces in parent

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 @rxin its for streaming sources and sinks as explained in the [doc]( https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/sources/v2/CustomMetrics.java

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 @HyukjinKwon yes we can mark it unstable. Like I mentioned multiple times in previous comments the traits added here like CustomMetrics, SupportsCustomReaderMetrics etc have nothing specific

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 There are many unknowns to be figured out for continuous mode. Though the way to capture the metrics would be different for continuous execution, the interface of whats reported

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 The CustomMetrics are traits which can be mixed in if necessary. (see https://github.com/apache/spark/pull/21721#issuecomment-403878383) and does not affect any other API as such. When query

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 @zsxwing @gatorsmile , this PR does not add new APIs as such. It builds on the existing StreamingQueryProgress and adds custom metrics to it. StreamingQueryProgress as such is not reported

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 For continuous queries, the progress could still be reported by posting QueryProgressEvent to the listener for each epoch (instead of micro-batch). The `StreamingQueryProgress` also could

[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...

2018-08-28 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22251#discussion_r213407599 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala --- @@ -1099,6 +1098,27 @@ class AvroSuite extends QueryTest

[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...

2018-08-28 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22251#discussion_r213407441 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala --- @@ -1099,6 +1098,27 @@ class AvroSuite extends QueryTest

[GitHub] spark issue #22251: [SPARK-25260][SQL] Fix namespace handling in SchemaConve...

2018-08-28 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/22251 cc @gengliangwang @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...

2018-08-28 Thread arunmahadevan
GitHub user arunmahadevan opened a pull request: https://github.com/apache/spark/pull/22251 [SPARK-25260][SQL] Fix namespace handling in SchemaConverters.toAvroType ## What changes were proposed in this pull request? `toAvroType` converts spark data type to avro schema

[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

2018-08-27 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22238#discussion_r213049895 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -2812,6 +2812,12 @@ See [Input Sources](#input-sources) and [Output Sinks](#output

[GitHub] spark pull request #22121: [SPARK-25133][SQL][Doc]Avro data source guide

2018-08-22 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22121#discussion_r212031015 --- Diff: docs/avro-data-source-guide.md --- @@ -0,0 +1,377 @@ +--- +layout: global +title: Apache Avro Data Source Guide

[GitHub] spark pull request #22143: [SPARK-24647][SS] Report KafkaStreamWriter's writ...

2018-08-20 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22143#discussion_r211339535 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala --- @@ -61,12 +63,30 @@ private[kafka010] class

[GitHub] spark pull request #22143: [SPARK-24647][SS] Report KafkaStreamWriter's writ...

2018-08-20 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22143#discussion_r211336988 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaStreamWriter.scala --- @@ -116,3 +133,66 @@ class

[GitHub] spark pull request #22143: [SPARK-24647][SS] Report KafkaStreamWriter's writ...

2018-08-20 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22143#discussion_r211336368 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaStreamWriter.scala --- @@ -19,18 +19,23 @@ package

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress

2018-08-16 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21919 LGTM overall except one minor comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...

2018-08-16 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21919#discussion_r210707152 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -254,3 +259,10 @@ class SinkProgress protected[sql

[GitHub] spark issue #21819: [SPARK-24863][SS] Report Kafka offset lag as a custom me...

2018-08-16 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21819 @HyukjinKwon , can you take it forward? Appreciate your effort and thanks in advance. --- - To unsubscribe, e-mail

[GitHub] spark pull request #21819: [SPARK-24863][SS] Report Kafka offset lag as a cu...

2018-08-13 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21819#discussion_r209699459 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala --- @@ -95,4 +95,20 @@ private object JsonUtils

[GitHub] spark pull request #21819: [SPARK-24863][SS] Report Kafka offset lag as a cu...

2018-08-13 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21819#discussion_r209699010 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala --- @@ -95,4 +95,20 @@ private object JsonUtils

[GitHub] spark issue #21819: [SPARK-24863][SS] Report Kafka offset lag as a custom me...

2018-08-10 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21819 @HeartSaVioR @HyukjinKwon @jose-torres @tdas would you mind taking a look? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-07 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21199 @HyukjinKwon this has been open for a while, would you mind taking this forward? --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-08-06 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r208106031 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala --- @@ -196,6 +237,18 @@ trait ProgressReporter

[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-06 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21199 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-06 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 @HyukjinKwon , the master code changed and I had to rebase and fix issues. Can you take it forward ? There seems to be unrelated test failures in Kafka 0.10 integration suite

[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-03 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21199 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-03 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-02 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21919 `numOutputRows` makes sense for all sinks, but I agree the counting should be done at the framework and not by individual sinks. For metrics that does not apply to all sinks, they could

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-02 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 The tests keeps failing and looks unrelated. @HyukjinKwon Let me know if you think theres something I should look

[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-02 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21199 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-02 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21199#discussion_r207237894 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousTextSocketSource.scala --- @@ -0,0 +1,295

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-08-02 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r207232187 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -163,7 +163,27 @@ class SourceProgress protected[sql

[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-01 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21199 @HeartSaVioR , Addressed your comments. Let me know if I missed something. Also rebased and had to change more code to use the new interfaces. I hope if we can speed up the review

[GitHub] spark pull request #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-01 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21199#discussion_r207078242 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousTextSocketSource.scala --- @@ -0,0 +1,295

[GitHub] spark pull request #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-01 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21199#discussion_r207078263 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousTextSocketSource.scala --- @@ -0,0 +1,295

[GitHub] spark pull request #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-01 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21199#discussion_r207078232 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousTextSocketSource.scala --- @@ -0,0 +1,295

[GitHub] spark pull request #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-01 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21199#discussion_r207078197 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousTextSocketSource.scala --- @@ -0,0 +1,295

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-08-01 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r207042893 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -163,7 +163,8 @@ class SourceProgress protected[sql

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-01 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 @HyukjinKwon , have addressed the comments and modified SourceProgress and SinkProgress to take String instead of JValue so that this can be easily used from Java. Regarding the default value

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-07-30 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r206347147 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/SupportsCustomReaderMetrics.java --- @@ -0,0 +1,45

[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-07-30 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21199 @HeartSaVioR , rebased with master. ping @jose-torres @tdas @zsxwing for review. --- - To unsubscribe, e

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-07-30 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 @HeartSaVioR thanks for taking time to review. Addressed the comments, can you take a look again? Regarding the mixin interface, would like to take feedback from others

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-07-30 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r206241038 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/streaming/SupportsCustomWriterMetrics.java --- @@ -0,0 +1,45

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-07-30 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r206237610 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala --- @@ -143,18 +150,50 @@ trait ProgressReporter

[GitHub] spark pull request #21819: [SPARK-24863][SS] Report Kafka offset lag as a cu...

2018-07-19 Thread arunmahadevan
GitHub user arunmahadevan opened a pull request: https://github.com/apache/spark/pull/21819 [SPARK-24863][SS] Report Kafka offset lag as a custom metrics ## What changes were proposed in this pull request? This builds on top of SPARK-24748 to report 'offset lag' as a custom

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-07-19 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 @jose-torres, addressed initial comments. @tdas, can you also take a look when possible ? --- - To unsubscribe, e-mail

[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-07-11 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21733 @HeartSaVioR , the results looks promising. I am wondering if theres a way to make this default option than introducing new configs. Since this is internal details anyway theres no need

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-07-11 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 @jose-torres I have removed the Kafka lag metrics out of this PR and added writer metrics and the number of rows in the memory sink as an example

[GitHub] spark issue #21673: [SPARK-24697][SS] Fix the reported start offsets in stre...

2018-07-11 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21673 @tdas Closing this in favor of #21744 . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21673: [SPARK-24697][SS] Fix the reported start offsets ...

2018-07-11 Thread arunmahadevan
Github user arunmahadevan closed the pull request at: https://github.com/apache/spark/pull/21673 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21744: [SPARK-24697][SS] Fix the reported start offsets in stre...

2018-07-11 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21744 @tdas logically this is similar to https://github.com/apache/spark/pull/21673. Yes it makes the control flow better and LGTM. Overall the progress reporter is still tightly coupled

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-07-06 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 @jose-torres @HeartSaVioR , Addressed the initial comments. Will add Writer support for custom metrics and add MemorySink as an example. I am ok to move out Kafka custom metrics

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-07-06 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r200730270 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala --- @@ -95,4 +95,25 @@ private object JsonUtils

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-07-06 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r200730240 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -178,12 +180,18 @@ class SourceProgress protected[sql

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-07-05 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 @tdas @jose-torres @HeartSaVioR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-07-05 Thread arunmahadevan
GitHub user arunmahadevan opened a pull request: https://github.com/apache/spark/pull/21721 [SPARK-24748][SS] Support for reporting custom metrics via StreamingQuery Progress ## What changes were proposed in this pull request? Currently the Structured Streaming sources

[GitHub] spark issue #21673: [SPARK-24697][SS] Fix the reported start offsets in stre...

2018-07-05 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21673 @tdas , thanks for your comments. Yes theres problem with the current abstraction, and I didn't consider refactoring it since there have been multiple changes to this class without changing

[GitHub] spark issue #21673: [SPARK-24697][SS] Fix the reported start offsets in stre...

2018-07-05 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21673 @HeartSaVioR , thanks for the inputs. Please check again. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21673: SPARK-24697: Fix the reported start offsets in streaming...

2018-06-29 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21673 @tdas @jose-torres @HeartSaVioR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21673: SPARK-24697: Fix the reported start offsets in st...

2018-06-29 Thread arunmahadevan
GitHub user arunmahadevan opened a pull request: https://github.com/apache/spark/pull/21673 SPARK-24697: Fix the reported start offsets in streaming query progress ## What changes were proposed in this pull request? Streaming query reports progress during each trigger (e.g

[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-06-26 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21622 Looks good overall, a couple of minor comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21622: [SPARK-24637][SS] Add metrics regarding state and...

2018-06-26 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21622#discussion_r198248243 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetricsReporter.scala --- @@ -39,6 +42,23 @@ class MetricsReporter

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-26 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r198222671 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala --- @@ -0,0 +1,108

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-25 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r197985638 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala --- @@ -0,0 +1,108

[GitHub] spark pull request #21617: [SPARK-24634][SS] Add a new metric regarding numb...

2018-06-25 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21617#discussion_r197984227 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -48,12 +49,13 @@ class StateOperatorProgress private[sql

[GitHub] spark pull request #21617: [SPARK-24634][SS] Add a new metric regarding numb...

2018-06-25 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21617#discussion_r197980605 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -48,12 +49,13 @@ class StateOperatorProgress private[sql

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-22 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r197526872 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala --- @@ -0,0 +1,108

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-22 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r197564080 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala --- @@ -0,0 +1,108

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-22 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r197561226 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala --- @@ -0,0 +1,108

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-22 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r197523482 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -349,6 +349,17 @@ object

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-22 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r197520939 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -349,6 +349,17 @@ object

[GitHub] spark issue #21504: [SPARK-24479][SS] Added config for registering streaming...

2018-06-12 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21504 @HyukjinKwon , addressed comments. Can you take it forward? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21504: [SPARK-24479][SS] Added config for registering st...

2018-06-12 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21504#discussion_r194817758 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala --- @@ -96,6 +96,14 @@ object StaticSQLConf

[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

2018-06-11 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21469#discussion_r194592510 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala --- @@ -112,14 +122,19 @@ trait StateStoreWriter

[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

2018-06-11 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21469#discussion_r194483603 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala --- @@ -112,14 +122,19 @@ trait StateStoreWriter

[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

2018-06-11 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21469#discussion_r194480087 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -247,6 +253,14 @@ private

[GitHub] spark pull request #21504: [SPARK-24479][SS] Added config for registering st...

2018-06-08 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21504#discussion_r194101270 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala --- @@ -55,6 +57,19 @@ class StreamingQueryManager

[GitHub] spark pull request #21504: [SPARK-24479][SS] Added config for registering st...

2018-06-08 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21504#discussion_r194100709 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala --- @@ -55,6 +57,19 @@ class StreamingQueryManager

[GitHub] spark pull request #21504: [SPARK-24479][SS] Added config for registering st...

2018-06-07 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21504#discussion_r193923588 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala --- @@ -55,6 +56,11 @@ class StreamingQueryManager

[GitHub] spark issue #21504: [SPARK-24479][SS] Added config for registering streaming...

2018-06-07 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21504 @HyukjinKwon , thanks for reviewing. Addressed comments. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21500: Scalable Memory option for HDFSBackedStateStore

2018-06-07 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21500 Clearing the map after each commit might make things worse, since the maps needs to be loaded from the snapshot + delta files for the next micro-batch. Setting

[GitHub] spark pull request #21504: SPARK-24480: Added config for registering streami...

2018-06-06 Thread arunmahadevan
GitHub user arunmahadevan opened a pull request: https://github.com/apache/spark/pull/21504 SPARK-24480: Added config for registering streamingQueryListeners ## What changes were proposed in this pull request? Currently a "StreamingQueryListener" can only be

[GitHub] spark issue #21504: SPARK-24480: Added config for registering streamingQuery...

2018-06-06 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21504 ping @tdas @jose-torres @HeartSaVioR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21469 Nice, LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

  1   2   >