[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144088592 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,10 @@ class

[GitHub] spark issue #19424: [SPARK-22197][SQL] push down operators to data source be...

2017-10-10 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19424 What are the guarantees made by the previous batches in the optimizer? The work done by `FilterAndProject` seems redundant to me because the optimizer should already push filters below projection

[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-10 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19269 > There is no restriction to let the output of data writers be visible to other writers, so it's possible to launch a write task just for cleaning up the data of other writers. Agr

[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-09 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19269 > The only contract Spark needs is: data written/committed by tasks should not be visible to data source readers until the job-level commitment. But they can be visible to others like other writ

[GitHub] spark pull request #19424: [SPARK-22197][SQL] push down operators to data so...

2017-10-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19424#discussion_r143597547 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala --- @@ -0,0 +1,104

[GitHub] spark pull request #19424: [SPARK-22197][SQL] push down operators to data so...

2017-10-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19424#discussion_r143593605 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala --- @@ -0,0 +1,104

[GitHub] spark pull request #19424: [SPARK-22197][SQL] push down operators to data so...

2017-10-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19424#discussion_r143597669 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala --- @@ -0,0 +1,104

[GitHub] spark pull request #19424: [SPARK-22197][SQL] push down operators to data so...

2017-10-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19424#discussion_r143591559 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala --- @@ -0,0 +1,104

[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-09 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19394 Thanks for reviewing, @gatorsmile! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r143524057 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/sources/v2/JavaSimpleWritableDataSource.java --- @@ -0,0 +1,297 @@ +/* + * Licensed

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143517522 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -274,19 +274,26 @@ abstract class SparkPlan extends QueryPlan

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143517490 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -274,19 +274,26 @@ abstract class SparkPlan extends QueryPlan

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143317742 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala --- @@ -58,7 +58,7 @@ class ConfigBehaviorSuite extends QueryTest

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143317686 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -280,13 +280,20 @@ abstract class SparkPlan extends QueryPlan

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143317518 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -280,13 +280,20 @@ abstract class SparkPlan extends QueryPlan

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-06 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19448 +1 I completely agree that using a ParquetOutputCommitter should be optional. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19394 Anyone have a clue what the python error could be? It doesn't look related. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143286130 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -73,25 +73,37 @@ case class

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143286104 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -73,25 +73,37 @@ case class

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143285737 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala --- @@ -58,7 +58,7 @@ class ConfigBehaviorSuite extends QueryTest

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143228054 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -73,25 +73,37 @@ case class

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143226823 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -73,25 +73,37 @@ case class

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143226714 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -73,25 +73,37 @@ case class

[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-05 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19394 Here's the error message: TestFailedException: 347.5272 was not greater than 1000 --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-05 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19394 Yes, we've been running this in production for a few weeks now. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-05 Thread rdblue
GitHub user rdblue reopened a pull request: https://github.com/apache/spark/pull/19394 [SPARK-22170][SQL] Reduce memory consumption in broadcast joins. ## What changes were proposed in this pull request? This updates the broadcast join code path to lazily decompress pages

[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-05 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19394 @rxin, any idea why this would fail ConfigBehaviorSuite? I don't think the failure is related because that test doesn't use a broadcast join. Should I rebase on master

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-05 Thread rdblue
Github user rdblue closed the pull request at: https://github.com/apache/spark/pull/19394 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143060736 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -228,7 +228,7 @@ class SQLMetricsSuite extends

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r142474766 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -280,13 +280,20 @@ abstract class SparkPlan extends QueryPlan

[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-09-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19394 Ideally, this would also use a TaskMemoryManager so the driver can spill results to disk instead of dying with an OOM. Is there any plan to add a memory manager for the driver

[GitHub] spark pull request #19394: SPARK-22170: Reduce memory consumption in broadca...

2017-09-29 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/19394 SPARK-22170: Reduce memory consumption in broadcast joins. This updates the broadcast join code path to lazily decompress pages and iterate through UnsafeRows to prevent all rows from being held

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r141679351 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r141679290 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r141460252 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-21 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r140389681 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-21 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r140371372 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-20 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r140018805 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java --- @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-20 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r140017014 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Command.scala --- @@ -0,0 +1,114

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139841435 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Command.scala --- @@ -0,0 +1,114

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139839037 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java --- @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139838908 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139838459 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139836068 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139835603 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139834571 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupport.java --- @@ -30,9 +30,8 @@ /** * Creates a {@link

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139832973 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138681562 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2

2017-09-11 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138123207 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed

[GitHub] spark pull request #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2

2017-09-08 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r137829674 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed

[GitHub] spark pull request #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2

2017-09-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r137597910 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed

[GitHub] spark pull request #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2

2017-09-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r137591425 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed

[GitHub] spark pull request #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2

2017-09-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r137587367 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SchemaRequiredDataSourceV2.java --- @@ -0,0 +1,42 @@ +/* + * Licensed

[GitHub] spark issue #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2

2017-09-06 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19136 Thanks for pinging me. I left comments on the older PR, since other discussion was already there. If you'd prefer comments here, just let me know

[GitHub] incubator-toree pull request #128: [TOREE-425] Force sparkContext initializa...

2017-07-17 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/incubator-toree/pull/128#discussion_r127858852 --- Diff: kernel/src/main/scala/org/apache/toree/boot/layer/ComponentInitialization.scala --- @@ -94,6 +98,18 @@ trait

[GitHub] incubator-toree pull request #128: [TOREE-425] Force sparkContext initializa...

2017-07-17 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/incubator-toree/pull/128#discussion_r127858615 --- Diff: kernel/src/main/scala/org/apache/toree/kernel/api/Kernel.scala --- @@ -414,13 +417,15 @@ class Kernel ( Await.result

[GitHub] spark issue #18450: [SPARK-21238][SQL] allow nested SQL execution

2017-06-28 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/18450 I think it is good that this would no longer throw exceptions at runtime. Is the purpose of not allowing nested executions to minimize the queries shown in the UI? If that's the only purpose then I

[GitHub] spark pull request #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecut...

2017-06-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/18419#discussion_r124331858 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala --- @@ -100,7 +105,9 @@ object SQLExecution { // all

[GitHub] spark issue #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecution.ign...

2017-06-26 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/18419 One minor comment, otherwise +1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecut...

2017-06-26 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/18419#discussion_r124058801 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala --- @@ -100,7 +105,9 @@ object SQLExecution { // all

[GitHub] incubator-toree pull request #104: TOREE-380: Allow interpreters to format o...

2017-06-15 Thread rdblue
Github user rdblue closed the pull request at: https://github.com/apache/incubator-toree/pull/104 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] incubator-toree pull request #104: TOREE-380: Allow interpreters to format o...

2017-06-14 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/incubator-toree/pull/104#discussion_r122049346 --- Diff: kernel-api/src/main/scala/org/apache/toree/interpreter/broker/BrokerTransformer.scala --- @@ -43,7 +43,7 @@ class BrokerTransformer

[GitHub] incubator-toree pull request #104: TOREE-380: Allow interpreters to format o...

2017-06-14 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/incubator-toree/pull/104#discussion_r122048356 --- Diff: kernel/src/test/scala/org/apache/toree/kernel/protocol/v5/relay/ExecuteRequestRelaySpec.scala --- @@ -88,7 +89,7 @@ class

[GitHub] incubator-toree pull request #104: TOREE-380: Allow interpreters to format o...

2017-06-14 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/incubator-toree/pull/104#discussion_r122047867 --- Diff: kernel/src/test/scala/org/apache/toree/kernel/protocol/v5/stream/KernelInputStreamSpec.scala --- @@ -65,6 +65,9 @@ class

[GitHub] incubator-toree pull request #104: TOREE-380: Allow interpreters to format o...

2017-06-14 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/incubator-toree/pull/104#discussion_r122046211 --- Diff: kernel-api/src/main/scala/org/apache/toree/interpreter/broker/BrokerTransformer.scala --- @@ -43,7 +43,7 @@ class BrokerTransformer

[GitHub] incubator-toree pull request #104: TOREE-380: Allow interpreters to format o...

2017-06-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/incubator-toree/pull/104#discussion_r121765250 --- Diff: scala-interpreter/build.sbt --- @@ -18,3 +18,4 @@ import sbt.Tests.{Group, SubProcess} */ libraryDependencies

[GitHub] incubator-toree pull request #104: TOREE-380: Allow interpreters to format o...

2017-06-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/incubator-toree/pull/104#discussion_r121765086 --- Diff: scala-interpreter/src/main/scala/org/apache/toree/kernel/interpreter/scala/ScalaDisplayers.scala --- @@ -0,0 +1,207

[GitHub] incubator-toree pull request #104: TOREE-380: Allow interpreters to format o...

2017-06-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/incubator-toree/pull/104#discussion_r121757635 --- Diff: scala-interpreter/src/main/scala/org/apache/toree/kernel/interpreter/scala/ScalaInterpreter.scala --- @@ -18,30 +18,34 @@ package

[GitHub] incubator-toree pull request #104: TOREE-380: Allow interpreters to format o...

2017-06-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/incubator-toree/pull/104#discussion_r121757522 --- Diff: scala-interpreter/src/main/scala/org/apache/toree/kernel/interpreter/scala/ScalaDisplayers.scala --- @@ -0,0 +1,207

[GitHub] incubator-toree pull request #104: TOREE-380: Allow interpreters to format o...

2017-06-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/incubator-toree/pull/104#discussion_r121757204 --- Diff: scala-interpreter/build.sbt --- @@ -18,3 +18,4 @@ import sbt.Tests.{Group, SubProcess} */ libraryDependencies

[GitHub] incubator-toree pull request #104: TOREE-380: Allow interpreters to format o...

2017-06-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/incubator-toree/pull/104#discussion_r121755733 --- Diff: kernel/src/test/scala/org/apache/toree/kernel/protocol/v5/stream/KernelInputStreamSpec.scala --- @@ -65,6 +65,9 @@ class

[GitHub] incubator-toree pull request #124: TOREE-407: Add support for hdfs and s3 to...

2017-06-10 Thread rdblue
Github user rdblue closed the pull request at: https://github.com/apache/incubator-toree/pull/124 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] incubator-toree pull request #124: TOREE-407: Add support for hdfs and s3 to...

2017-06-10 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/incubator-toree/pull/124 TOREE-407: Add support for hdfs and s3 to AddJar. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rdblue/incubator-toree TOREE-407-add

[GitHub] incubator-toree pull request #104: TOREE-380: Allow interpreters to format o...

2017-06-10 Thread rdblue
GitHub user rdblue reopened a pull request: https://github.com/apache/incubator-toree/pull/104 TOREE-380: Allow interpreters to format output. This branch has the changes that I made to our Toree distribution to allow interpreters to format output. The goal is to enable better

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-08 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/18162 @tgravescs, I deployed this to our production environment (based on 2.0.0) a few days ago and haven't hit any problems with it. I think this is good to go, unless something has been added recently

[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-06-08 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/18064 Refactor? I thought that was the problem with the [original PR](https://github.com/apache/spark/pull/17540); that PR was too narrow and didn't unify the physical plans to get all metrics

[GitHub] spark issue #18181: [SPARK-20958][SQL] Roll back parquet-mr 1.8.2 to 1.8.1

2017-06-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/18181 It sounds like we should plan on a 1.8.3 and a 1.9.1 soon in the Parquet community. I'll start this up. On Fri, Jun 2, 2017 at 9:47 AM, Michael Allman <notificati...@github.com>

[GitHub] spark issue #18181: [SPARK-20958][SQL] Roll back parquet-mr 1.8.2 to 1.8.1

2017-06-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/18181 -1, with comments on the JIRA issue. I think it is better to include the Parquet fixes in 1.8.2 since Parquet doesn't pull in Avro 1.8.1 - that happens when users declare their own dependency

[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-25 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/18064 @cloud-fan, can you summarize how this differs from the original PR #17540? I have time to pick this up again, but I thought that the other PR only needed two changes: * Merge your

[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/18064 Agreed, sorry I haven't updated it. I was out most of last week. I'll get this fixed up as soon as I can. Thanks for all your help! --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...

2017-05-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17680 There's an open PR ([#361](https://github.com/apache/parquet-mr/pull/361)), to support quoted column names, but the discussion on the merits of it is on-going. I don't see a huge benefit

[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...

2017-05-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17680 @gatorsmile, sorry for not responding, I was on vacation for a few days. Should I still review this even though it is merged? --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #12313: [SPARK-14543] [SQL] Improve InsertIntoTable column resol...

2017-05-11 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/12313 We were trying to get this in just before the 2.0 release, which was a bad time. We've just been maintaining it in our version, but I'm going to be rebasing it on to 2.1 soon so I'll see what needs

[GitHub] spark issue #13206: [SPARK-15420] [SQL] Add repartition and sort to prepare ...

2017-05-11 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13206 @HyukjinKwon, that addresses part of what this patch does, but only for writes that go through FileFormatWriter. This patch works for Hive and adds an optimizer rule to add the sort instead

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-05-05 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17540 @cloud-fan, @zsxwing, tests are passing now. Should we commit this so we can start fixing the metrics? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-05-04 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17540 @cloud-fan, all of the `RunnableCommand` instances that are currently run through `ExecutedCommandExec` need to be fixed so that there is only one physical plan. But the scope of those changes

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-05-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17540 I'm not an expert on the metrics path, but I think we should be able to join up the actual physical plans well enough to display everything. I doubt it will be a long-term regression, but I don't

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-05-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17540 @zsxwing, you don't think there's a way to fix metrics? I don't know exactly how to fix the UI to show two plans worth of metrics, but it seems like it can be done. What about also updating

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-05-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17540 @zsxwing, I don't know. Sounds like we should fix the underlying problem that there are 2 physical plans. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-05-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17540 @zsxwing, there should be a fix for the metrics without waiting for all of the bad plans to be fixed (which is to basically eliminate the use of `ExecutedCommandExec`). The metrics

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-05-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17540 @cloud-fan, @carsonwang pointed out that that logic for nested SQL executions was backward, so it would warn during tests and probably fail at runtime. I fixed that and now there are more test

[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...

2017-05-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/17540#discussion_r114235126 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala --- @@ -73,21 +99,35 @@ object SQLExecution { } r

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-05-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17540 @carsonwang, the plan for when we notice queries that don't appear in the SQL tab is to add a call to `checkSQLExecutionId`, which will cause tests to fail when that operation isn't wrapped

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-05-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17540 Rebased. I'll check if tests pass later tonight. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17813: [SPARK-20540][CORE] Fix unstable executor requests.

2017-05-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17813 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17813: [SPARK-20540][CORE] Fix unstable executor requests.

2017-05-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17813 @vanzin, I fixed your review comments and tests are passing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #17813: [SPARK-20540][CORE] Fix unstable executor request...

2017-05-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/17813#discussion_r114166764 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -589,8 +605,18 @@ class

[GitHub] spark pull request #17813: [SPARK-20540][CORE] Fix unstable executor request...

2017-05-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/17813#discussion_r114166338 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -589,8 +605,18 @@ class

[GitHub] spark issue #17813: [SPARK-20540][CORE] Fix unstable executor requests.

2017-04-30 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17813 @vanzin, can you take a look at this? It is a dynamic allocation bug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #17813: [SPARK-20540][CORE] Fix unstable executor request...

2017-04-30 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/17813 [SPARK-20540][CORE] Fix unstable executor requests. There are two problems fixed in this commit. First, the ExecutorAllocationManager sets a timeout to avoid requesting executors too often

<    5   6   7   8   9   10   11   12   13   14   >