[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-08-07 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r208423190 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala --- @@ -49,4 +51,11 @@ object DataSourceUtils

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-08-07 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r208295767 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala --- @@ -49,4 +51,11 @@ object DataSourceUtils

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-08-06 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 @maropu @gatorsmile, Any other changes required? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-08-03 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-08-03 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 @maropu @gatorsmile The tests are failing in places unrelated to my changes. How can I resolve this? --- - To unsubscribe, e

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-08-01 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r207090805 --- Diff: docs/sql-programming-guide.md --- @@ -1872,6 +1872,8 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-08-01 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r207090543 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1449,6 +1449,15 @@ object SQLConf { .intConf

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-08-01 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r207090467 --- Diff: docs/sql-programming-guide.md --- @@ -1872,6 +1872,8 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-08-01 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 @gatorsmile, I have addressed the comments. Any other fix required? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-08-01 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r207035320 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -78,7 +93,8 @@ object CommandUtils extends Logging

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-08-01 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r207035227 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1449,6 +1449,13 @@ object SQLConf { .intConf

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-08-01 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r207035140 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1449,6 +1449,13 @@ object SQLConf { .intConf

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-08-01 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 @gatorsmile, I have made the changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-27 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-27 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r205916878 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,23 @@ object CommandUtils extends Logging

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-27 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r205916762 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,23 @@ object CommandUtils extends Logging

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-26 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r205546260 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,23 @@ object CommandUtils extends Logging

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-26 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r205545098 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,23 @@ object CommandUtils extends Logging

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-26 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r205544840 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,23 @@ object CommandUtils extends Logging

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-26 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 @gatorsmile Can you review this please? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-24 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r204823567 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,26 @@ object CommandUtils extends Logging

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-24 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r204823533 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,26 @@ object CommandUtils extends Logging

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-24 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r204823422 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -148,6 +148,19 @@ class StatisticsSuite extends

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-24 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r204823485 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala --- @@ -59,14 +59,15 @@ abstract class

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-24 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r204823137 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -148,6 +149,25 @@ class StatisticsSuite extends

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-24 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-23 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r204618663 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -148,6 +148,19 @@ class StatisticsSuite extends

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-23 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r204618589 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala --- @@ -55,4 +57,11 @@ private[sql] object

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-23 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 @gatorsmile @maropu ping. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-20 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r204200662 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,27 @@ object CommandUtils extends Logging

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-20 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r204121451 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,27 @@ object CommandUtils extends Logging

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-19 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-19 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r203613041 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -148,6 +148,19 @@ class StatisticsSuite extends

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-18 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r203456566 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,27 @@ object CommandUtils extends Logging

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-17 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 @maropu @gatorsmile Can you review this? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #18193: [SPARK-15616] [SQL] CatalogRelation should fallback to H...

2018-07-17 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/18193 This fix is useful, is there any update on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-13 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 Ping @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-12 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 @maropu I have added a simple test, Can you check it out? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-11 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 @maropu, The method `calculateTotalSize` is already tested in `StatisticsSuite` in a few places. One such test is "analyze Hive serde tables" where we check to make sure the calcu

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-11 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r201820832 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala --- @@ -162,7 +162,7 @@ object InMemoryFileIndex

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-11 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r201820630 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,29 @@ object CommandUtils extends Logging

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-11 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 @maropu regarding tests, since I didn't make any changes to `inMemoryFileIndex` isn't it better if I added tests in StatisticsCollectionSuite/StatisticsSuite? Also, since we just

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-07 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r200824025 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,26 @@ object CommandUtils extends Logging

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-06 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r200666542 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,26 @@ object CommandUtils extends Logging

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-05 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r200544127 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,26 @@ object CommandUtils extends Logging

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-04 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 @maropu I have made the changes. What are the next steps? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve Analyze Table command

2018-06-22 Thread Achuth17
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 Yes, In the case where the data is stored in S3 I noticed a significant difference. Some rough numbers - When done serially for a table in S3 with 1000 partitions, the calculateTotalSize

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve Analyze Table command

2018-06-21 Thread Achuth17
GitHub user Achuth17 opened a pull request: https://github.com/apache/spark/pull/21608 [SPARK-24626] [SQL] Improve Analyze Table command ## What changes were proposed in this pull request? Currently, Analyze table calculates table size sequentially for each partition. We