[GitHub] spark issue #21145: [SPARK-24073][SQL]: Rename DataReaderFactory to ReadTask...

2018-05-04 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21145 I don't see the problem with the name ReadTask. In RDDs, we call the serializable representation of a partition for distribution to executors just Partition, and I've always found this pretty

[GitHub] spark pull request #21200: [SPARK-24039][SS] Do continuous processing writes...

2018-05-04 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21200#discussion_r186123363 --- Diff: sql/core/pom.xml --- @@ -146,6 +146,11 @@ parquet-avro test + + org.mockito

[GitHub] spark issue #20936: [SPARK-23503][SS] Enforce sequencing of committed epochs...

2018-05-06 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20936 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20936: [SPARK-23503][SS] Enforce sequencing of committed epochs...

2018-05-06 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20936 (I agree that your PR isn't responsible here, there's a known problem with that suite.) --- - To unsubscribe, e-mail

[GitHub] spark issue #21220: [SPARK-24157][SS] Enabled no-data batches in MicroBatchE...

2018-05-04 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21220 LGTM. Obviously shouldn't block this PR, but MicroBatchExecution is structured in a way that makes it hard to review changes like this. It seems like changing the condition under which

[GitHub] spark pull request #21239: [SPARK-24040][SS] Support single partition aggreg...

2018-05-04 Thread jose-torres
GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/21239 [SPARK-24040][SS] Support single partition aggregates in continuous processing. ## What changes were proposed in this pull request? Support aggregates with exactly 1 partition

[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...

2018-05-04 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21222 LGTM, but I'm not familiar with this debug package so I can't evaluate if it's being used correctly. --- - To unsubscribe

[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-05-14 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21199 I think that's unavoidable if we want to have a socket source. The microbatch socket source has the same thing going on. I'd expect most people looking into implementation details of data

[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-05-08 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21199 I won't be able to look at this in detail until next week. In general, I think this is a great source to have available. I wonder if it'd be worthwhile to try and abstract the record

[GitHub] spark issue #21145: [SPARK-24073][SQL]: Rename DataReaderFactory to InputPar...

2018-05-09 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21145 LGTM. I can own cleaning up the names of the streaming classes, probably wrapping that into the broader task of getting a design doc for the streaming reader API

[GitHub] spark pull request #21293: [SPARK-24237][SS] Continuous shuffle dependency a...

2018-05-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21293#discussion_r187366150 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -65,15 +65,17 @@ abstract class NarrowDependency[T](_rdd: RDD[T]) extends

[GitHub] spark pull request #21293: [SPARK-24237][SS] Continuous shuffle dependency a...

2018-05-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21293#discussion_r187367770 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -233,6 +239,28 @@ private[spark] class MapOutputTrackerMasterEndpoint

[GitHub] spark pull request #21293: [SPARK-24237][SS] Continuous shuffle dependency a...

2018-05-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21293#discussion_r187371707 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -769,6 +796,43 @@ private[spark] class MapOutputTrackerWorker(conf

[GitHub] spark pull request #21293: [SPARK-24237][SS] Continuous shuffle dependency a...

2018-05-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21293#discussion_r187367699 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -233,6 +239,28 @@ private[spark] class MapOutputTrackerMasterEndpoint

[GitHub] spark pull request #21293: [SPARK-24237][SS] Continuous shuffle dependency a...

2018-05-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21293#discussion_r187371412 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ContinuousShuffleMapTask.scala --- @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21293: [SPARK-24237][SS] Continuous shuffle dependency a...

2018-05-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21293#discussion_r187366974 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -88,14 +90,53 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: ClassTag

[GitHub] spark pull request #21293: [SPARK-24237][SS] Continuous shuffle dependency a...

2018-05-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21293#discussion_r187372104 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ContinuousShuffleMapTask.scala --- @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21293: [SPARK-24237][SS] Continuous shuffle dependency a...

2018-05-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21293#discussion_r187368178 --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala --- @@ -227,6 +228,7 @@ object SparkEnv extends Logging

[GitHub] spark pull request #21293: [SPARK-24237][SS] Continuous shuffle dependency a...

2018-05-11 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21293#discussion_r187639422 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -88,14 +90,53 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: ClassTag

[GitHub] spark pull request #21293: [SPARK-24237][SS] Continuous shuffle dependency a...

2018-05-11 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21293#discussion_r187641941 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -769,6 +796,43 @@ private[spark] class MapOutputTrackerWorker(conf

[GitHub] spark pull request #21293: [SPARK-24237][SS] Continuous shuffle dependency a...

2018-05-11 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21293#discussion_r187646713 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ContinuousShuffleMapTask.scala --- @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21293: [SPARK-24237][SS] Continuous shuffle dependency a...

2018-05-11 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21293#discussion_r187638899 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -65,15 +65,17 @@ abstract class NarrowDependency[T](_rdd: RDD[T]) extends

[GitHub] spark pull request #21293: [SPARK-24237][SS] Continuous shuffle dependency a...

2018-05-11 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21293#discussion_r187641292 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -233,6 +239,28 @@ private[spark] class MapOutputTrackerMasterEndpoint

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21118 LGTM, although as you mentioned I think it'd definitely be valuable to follow up and understand why some operators insist on UnsafeRow even though this isn't what SparkPlan declares as the row

[GitHub] spark pull request #21239: [SPARK-24040][SS] Support single partition aggreg...

2018-05-07 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21239#discussion_r186467958 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala --- @@ -0,0 +1,81

[GitHub] spark pull request #21239: [SPARK-24040][SS] Support single partition aggreg...

2018-05-07 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21239#discussion_r186462620 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/EpochTracker.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed

[GitHub] spark pull request #21239: [SPARK-24040][SS] Support single partition aggreg...

2018-05-07 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21239#discussion_r186476292 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala --- @@ -0,0 +1,81

[GitHub] spark issue #21353: [SPARK-24304][SS] Scheduler changes for continuous proce...

2018-05-17 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21353 As I've mentioned elsewhere, stages are currently submitted sequentially. That is, for a stage X, all the stage dependencies of X are completed before the tasks within X start. This change

[GitHub] spark issue #21337: [SPARK-24234][SS] Reader for continuous processing shuff...

2018-05-15 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21337 @HeartSaVioR @arunmahadevan @xuanyuanking --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21332: [SPARK-24236][SS] Continuous replacement for ShuffleExch...

2018-05-15 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21332 We're definitely agreed on the broad idea of having a ContinuousShuffleExchangeExec which reuses most of the existing ShuffleExchangeExec. As discussed in the other PR, I'm not sure

[GitHub] spark pull request #21337: [SPARK-24234][SS] Reader for continuous processin...

2018-05-15 Thread jose-torres
GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/21337 [SPARK-24234][SS] Reader for continuous processing shuffle ## What changes were proposed in this pull request? Read RDD for continuous processing shuffle, as well as the initial RPC

[GitHub] spark pull request #21253: [SPARK-24158][SS] Enable no-data batches for stre...

2018-05-16 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21253#discussion_r188703325 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala --- @@ -568,14 +567,16 @@ class StreamingOuterJoinSuite

[GitHub] spark pull request #21253: [SPARK-24158][SS] Enable no-data batches for stre...

2018-05-16 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21253#discussion_r188703005 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala --- @@ -187,6 +187,12 @@ case class

[GitHub] spark pull request #21253: [SPARK-24158][SS] Enable no-data batches for stre...

2018-05-16 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21253#discussion_r188703161 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala --- @@ -568,14 +567,16 @@ class StreamingOuterJoinSuite

[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

2018-05-16 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21253 lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21385: [SPARK-24234][SS] Support multiple row writers in contin...

2018-05-21 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21385 @HeartSaVioR @arunmahadevan @xuanyuanking @tdas --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21384: [SPARK-23416][SS] Add a specific stop method for ...

2018-05-21 Thread jose-torres
GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/21384 [SPARK-23416][SS] Add a specific stop method for ContinuousExecution. ## What changes were proposed in this pull request? Add a specific stop method for ContinuousExecution

[GitHub] spark issue #21384: [SPARK-23416][SS] Add a specific stop method for Continu...

2018-05-21 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21384 @zsxwing @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-21 Thread jose-torres
GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/21385 [SPARK-24234][SS] Support multiple row writers in continuous processing shuffle reader. ## What changes were proposed in this pull request? https://docs.google.com/document/d

[GitHub] spark issue #21385: [SPARK-24234][SS] Support multiple row writers in contin...

2018-05-21 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21385 In the above run Zookeeper couldn't get set up for some reason. It's surely unrelated, we aren't touching any currently executed codepath here

[GitHub] spark issue #21385: [SPARK-24234][SS] Support multiple row writers in contin...

2018-05-21 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21385 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21384: [SPARK-23416][SS] Add a specific stop method for ...

2018-05-22 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21384#discussion_r189970144 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala --- @@ -356,6 +356,22 @@ class

[GitHub] spark pull request #21392: [SPARK-24063][SS] Control maximum epoch backlog f...

2018-05-22 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21392#discussion_r190112179 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/EpochCoordinator.scala --- @@ -45,6 +45,11 @@ private[sql] case

[GitHub] spark pull request #21392: [SPARK-24063][SS] Control maximum epoch backlog f...

2018-05-22 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21392#discussion_r190112873 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala --- @@ -233,9 +235,15 @@ class

[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-22 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21385#discussion_r190080895 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/shuffle/ContinuousShuffleReadSuite.scala --- @@ -161,13 +189,15 @@ class

[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-22 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21385#discussion_r190088531 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala --- @@ -56,20 +62,46 @@ private

[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-22 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21385#discussion_r190088467 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala --- @@ -56,20 +62,46 @@ private

[GitHub] spark issue #21385: [SPARK-24234][SS] Support multiple row writers in contin...

2018-05-22 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21385 addressed comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...

2018-05-25 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21428#discussion_r191027072 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleWriter.scala --- @@ -0,0 +1,27

[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...

2018-05-25 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21428#discussion_r191026443 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleWriteRDD.scala --- @@ -0,0 +1,60

[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...

2018-05-25 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21428#discussion_r191027151 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowWriter.scala --- @@ -0,0 +1,54

[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle writer fo...

2018-05-25 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21428 addressed comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...

2018-05-25 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21428 @HeartSaVioR @arunmahadevan @xuanyuanking --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21308: SPARK-24253: Add DeleteSupport mix-in for DataSou...

2018-05-24 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21308#discussion_r190699855 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DeleteSupport.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-24 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21385#discussion_r190684685 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/shuffle/ContinuousShuffleReadSuite.scala --- @@ -160,25 +170,122 @@ class

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-05-24 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r190695611 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -344,6 +344,36 @@ case class Join

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-05-24 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r190696654 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -240,21 +238,27 @@ final class DataFrameWriter[T] private[sql](ds

[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...

2018-05-24 Thread jose-torres
GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/21428 [SPARK-24235][SS] Implement continuous shuffle write RDD for single reader partition. ## What changes were proposed in this pull request? Implement continuous shuffle write RDD

[GitHub] spark issue #21355: [SPARK-24308][SQL] Handle DataReaderFactory to InputPart...

2018-05-18 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21355 LGTM. Thanks for handling this! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #21199: [SPARK-24127][SS] Continuous text socket source

2018-05-17 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21199#discussion_r189062212 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ContinuousRecordEndpoint.scala --- @@ -0,0 +1,61

[GitHub] spark pull request #21199: [SPARK-24127][SS] Continuous text socket source

2018-05-17 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21199#discussion_r189063593 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala --- @@ -256,7 +258,101 @@ class

[GitHub] spark pull request #21337: [SPARK-24234][SS] Reader for continuous processin...

2018-05-15 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21337#discussion_r188456856 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala --- @@ -0,0 +1,64

[GitHub] spark pull request #21337: [SPARK-24234][SS] Reader for continuous processin...

2018-05-15 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21337#discussion_r188456692 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala --- @@ -0,0 +1,64

[GitHub] spark issue #21337: [SPARK-24234][SS] Reader for continuous processing shuff...

2018-05-16 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21337 @tdas --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21337: [SPARK-24234][SS] Reader for continuous processin...

2018-05-16 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21337#discussion_r188829894 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala --- @@ -0,0 +1,64

[GitHub] spark pull request #21337: [SPARK-24234][SS] Reader for continuous processin...

2018-05-16 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21337#discussion_r188829815 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala --- @@ -0,0 +1,64

[GitHub] spark pull request #21337: [SPARK-24234][SS] Reader for continuous processin...

2018-05-16 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21337#discussion_r188829780 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala --- @@ -0,0 +1,56

[GitHub] spark pull request #21337: [SPARK-24234][SS] Reader for continuous processin...

2018-05-16 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21337#discussion_r188831887 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/shuffle/ContinuousShuffleReadSuite.scala --- @@ -0,0 +1,122

[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-23 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21385#discussion_r190305870 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala --- @@ -56,20 +69,73 @@ private

[GitHub] spark issue #21400: [SPARK-24351][SS]offsetLog/commitLog purge thresholdBatc...

2018-05-23 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21400 Nice catch! Consider adding a unit test for this. I'm not sure how easy it'd be off the top of my head to test what the offset log is retaining, and I think it's a simple enough change

[GitHub] spark pull request #21392: [SPARK-24063][SS] Control maximum epoch backlog f...

2018-05-23 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21392#discussion_r190316039 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/EpochCoordinator.scala --- @@ -45,6 +45,11 @@ private[sql] case

[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-23 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21385#discussion_r190324822 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala --- @@ -41,11 +50,15 @@ private

[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-23 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21385#discussion_r190328335 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala --- @@ -56,20 +69,71 @@ private

[GitHub] spark issue #21385: [SPARK-24234][SS] Support multiple row writers in contin...

2018-05-23 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21385 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21337: [SPARK-24234][SS] Reader for continuous processing shuff...

2018-05-17 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21337 Actually, now that I think about it, there's a lot of value in having that interface, both to make sure we don't accidentally leak across it and possibly allow debugging mode in the future

[GitHub] spark pull request #21337: [SPARK-24234][SS] Reader for continuous processin...

2018-05-17 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21337#discussion_r189133964 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala --- @@ -0,0 +1,64

[GitHub] spark issue #20936: [SPARK-23503][SS] Enforce sequencing of committed epochs...

2018-05-17 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20936 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20936: [SPARK-23503][SS] Enforce sequencing of committed epochs...

2018-05-17 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20936 ugh that flaky kafka test. It's already reported, and I've been looking into it this week albeit with little luck

[GitHub] spark issue #21490: [SPARK-24462][SS] Initialize the offsets correctly when ...

2018-06-09 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21490 (sorry, I've been busy with Spark Summit) The problem I see is that fault tolerance might not be cleanly separable from query stop tolerance. If a user stops the query at the wrong time

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-09 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21469 (Sorry to comment after so long with such a minor change - I've been busy with spark summit) metricProviderLoaderMapSize should be metricProviderLoaderMapSizeBytes, both for clarity

[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

2018-06-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21469#discussion_r194232690 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala --- @@ -231,7 +231,7 @@ class

[GitHub] spark pull request #21506: [SPARK-24485][SS] Measure and log elapsed time fo...

2018-06-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21506#discussion_r194266060 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -280,38 +278,49 @@ private

[GitHub] spark pull request #21506: [SPARK-24485][SS] Measure and log elapsed time fo...

2018-06-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21506#discussion_r194266029 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -280,38 +278,49 @@ private

[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...

2018-06-12 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21428#discussion_r194950212 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleWriter.scala --- @@ -0,0 +1,54

[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...

2018-06-12 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21497 lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...

2018-06-12 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21428#discussion_r194950709 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleReader.scala --- @@ -79,10 +77,10

[GitHub] spark issue #21504: [SPARK-24479][SS] Added config for registering streaming...

2018-06-12 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21504 (Don't block on me - I won't have time to review in detail unless needed, but broadly the PR looks fine

[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...

2018-06-12 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21428#discussion_r194952174 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleReader.scala --- @@ -68,7 +66,7

[GitHub] spark issue #21506: [SPARK-24485][SS] Measure and log elapsed time for files...

2018-06-12 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21506 lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21503: [SPARK-24478][SQL] Move projection and filter pus...

2018-06-12 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21503#discussion_r194953803 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -17,15 +17,56 @@ package

[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...

2018-06-12 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21428#discussion_r194962429 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleReader.scala --- @@ -68,7 +66,7

[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...

2018-06-12 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21428#discussion_r194962850 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala --- @@ -34,8 +34,10

[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle writer fo...

2018-06-13 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21428 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21559: [SPARK-24525][SS] Provide an option to limit numb...

2018-06-14 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21559#discussion_r195516533 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala --- @@ -294,6 +333,16 @@ class MemorySink(val schema

[GitHub] spark pull request #21559: [SPARK-24525][SS] Provide an option to limit numb...

2018-06-14 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21559#discussion_r195516859 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/memoryV2.scala --- @@ -110,40 +126,61 @@ class MemorySinkV2 extends

[GitHub] spark issue #21559: [SPARK-24525][SS] Provide an option to limit number of r...

2018-06-14 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21559 lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...

2018-06-13 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21560 @HeartSaVioR @arunmahadevan @xuanyuanking @tdas @zsxwing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-13 Thread jose-torres
GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/21560 [SPARK-24386][SS] coalesce(1) aggregates in continuous processing ## What changes were proposed in this pull request? Provide a continuous processing implementation of coalesce(1

[GitHub] spark pull request #21400: [SPARK-24351][SS]offsetLog/commitLog purge thresh...

2018-05-30 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21400#discussion_r191278428 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala --- @@ -34,7 +34,8 @@ class ContinuousSuiteBase

[GitHub] spark pull request #21400: [SPARK-24351][SS]offsetLog/commitLog purge thresh...

2018-05-30 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21400#discussion_r191278390 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala --- @@ -225,6 +226,40 @@ class ContinuousSuite

  1   2   3   4   5   6   >