[GitHub] spark pull request #15539: [SPARK-17994] [SQL] Add back a file status cache ...

2016-10-19 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15539#discussion_r84118387 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TableFileCatalog.scala --- @@ -32,13 +32,21 @@ import

[GitHub] spark pull request #15538: [SPARK-17993][SQL] Fix Parquet log output redirec...

2016-10-19 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15538#discussion_r84114928 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -56,6 +56,7 @@ class ParquetFileFormat

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-18 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15539 @ericl I took this PR for a test drive with some large-ish tables. Everything appeared to work as expected. As far as performance goes, planning a simple select on a partitioned table

[GitHub] spark pull request #15539: [SPARK-17994] [SQL] Add back a file status cache ...

2016-10-18 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15539#discussion_r83990858 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveTablePerfStatsSuite.scala --- @@ -103,11 +92,84 @@ class HiveDataFrameSuite extends

[GitHub] spark pull request #15539: [SPARK-17994] [SQL] Add back a file status cache ...

2016-10-18 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15539#discussion_r83988599 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -38,14 +38,16 @@ class ListingFileCatalog

[GitHub] spark pull request #15538: [SPARK-17993][SQL] Fix Parquet log output redirec...

2016-10-18 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15538#discussion_r83988497 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -56,6 +56,7 @@ class ParquetFileFormat

[GitHub] spark pull request #15539: [SPARK-17994] [SQL] Add back a file status cache ...

2016-10-18 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15539#discussion_r83987399 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -38,14 +38,16 @@ class ListingFileCatalog

[GitHub] spark pull request #15539: [SPARK-17994] [SQL] Add back a file status cache ...

2016-10-18 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15539#discussion_r83987017 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileCatalog.scala --- @@ -276,15 +290,15 @@ object

[GitHub] spark pull request #15538: [SPARK-17993][SQL] Fix Parquet log output redirec...

2016-10-18 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15538#discussion_r83984654 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -56,6 +56,7 @@ class ParquetFileFormat

[GitHub] spark pull request #15539: [SPARK-17994] [SQL] Add back a file status cache ...

2016-10-18 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15539#discussion_r83980885 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -1486,10 +1485,10 @@ private[spark] object Utils extends Logging { val

[GitHub] spark pull request #15538: [SPARK-17993][SQL] Fix Parquet log output redirec...

2016-10-18 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/15538 [SPARK-17993][SQL] Fix Parquet log output redirection (Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-17993) ## What changes were proposed in this pull request

[GitHub] spark pull request #15518: [SPARK-17974] Refactor FileCatalog classes to sim...

2016-10-17 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15518#discussion_r83741391 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala --- @@ -626,8 +626,9 @@ class

[GitHub] spark pull request #15518: [SPARK-17974] Refactor FileCatalog classes to sim...

2016-10-17 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15518#discussion_r83738938 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileCatalog.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-14 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83518291 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -626,6 +627,40 @@ private[spark] class HiveExternalCatalog(conf

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-14 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83517834 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -17,32 +17,26 @@ package

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 >> Hm, I haven't seen that with my test queries. Would adding your workaround to SparkILoopInit work? > It does not, unfortunately. I believe this impacts people with parq

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 > Hm, I haven't seen that with my test queries. Would adding your workaround to SparkILoopInit work? It does not, unfortunately. --- If your project is set up for it, you can re

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 > btw, what's the parquet log redirection issue? I don't see anything unusual in spark shell. Whenever I run a query on a Hive parquet table I get ``` spark-sql> sele

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 I just pushed the rebase. It was really hairy, but I tried hard to ensure I got essentially all three branches' changes in. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 I'm still working on the rebase. It's very complex—there are two other commits involved. >> 1. Do we need a workaround for ORC like we made for Parquet? > 1) yes

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 I will work on a rebase. Meanwhile, I've revisited the open issues in the PR description. To summarize: 1. Do we need a workaround for ORC like we made for Parquet? 1. What's the impact

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83355030 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/parquetSuites.scala --- @@ -34,7 +34,7 @@ import org.apache.spark.util.Utils // The data

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83352979 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -626,6 +627,40 @@ private[spark] class HiveExternalCatalog(conf

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83349072 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/parquetSuites.scala --- @@ -34,7 +34,7 @@ import org.apache.spark.util.Utils // The data

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-13 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 > Oops there is a conflict now. NP. I'm working on the rebase. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If y

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83325529 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -225,13 +225,19 @@ case class FileSourceScanExec

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83325088 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala --- @@ -0,0 +1,72 @@ +/* + * Licensed

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83318096 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -616,6 +617,44 @@ private[spark] class HiveExternalCatalog(conf

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-13 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 > Btw, I noticed that this suite was failing in jenkins only. > > [info] - partitioned pruned table reports only selected files *** FAILED *** (610 milliseconds) > >

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-13 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 > This patch fails MiMa tests. I've never seen this before. What does this mean? --- If your project is set up for it, you can reply to this email and have your reply appear on Git

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-13 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 I've pushed an update to `ParquetMetastoreSuite` that illustrates the bug (or "limitation") WRT support for mixed-case partition columns I discovered yesterday. To

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-12 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83141382 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -616,6 +617,44 @@ private[spark] class HiveExternalCatalog(conf

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-12 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83131630 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -199,59 +197,30 @@ private[hive] class HiveMetastoreCatalog

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-12 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83115827 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -199,59 +197,30 @@ private[hive] class HiveMetastoreCatalog

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-12 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83086945 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala --- @@ -0,0 +1,72 @@ +/* + * Licensed

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-12 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83085289 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -616,6 +617,44 @@ private[spark] class HiveExternalCatalog(conf

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-12 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83085625 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TableFileCatalog.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 I determined the performance regression was introduced by a commit I hadn't pushed to this PR. Sorry for the false alarm. 😞 Needless to say, I'm not pushing that commit. --- If your project

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 >> Btw I've noticed a significant performance difference between ListingFileCatalog and TableFileCatalog's implementation of ListFiles. The difference seems to be that ListingFileC

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 > Btw I've noticed a significant performance difference between ListingFileCatalog and TableFileCatalog's implementation of ListFiles. The difference seems to be that ListingFileCata

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 I'm testing this patch on a couple of tables internally with on the order of 10k partitions. Performance is much slower than it should be. I'm investigating. --- If your project is set up

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-12 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83036315 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala --- @@ -0,0 +1,72 @@ +/* + * Licensed

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 I updated the description of this PR to reflect the workaround for the Hive/Parquet case-sensitivity issue. Do we need a similar workaround for ORC? --- If your project is set up

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r82902172 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -225,13 +225,16 @@ case class FileSourceScanExec

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r82900810 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -225,13 +225,16 @@ case class FileSourceScanExec

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 >> Finally, this would require us to read the schema files. That's something I'm trying to avoid in this patch. > Not sure what you mean here, but the parquet change should be

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 I believe that using a method like `TableFileCatalog.filterPartitions` to build a new file catalog restricted to some pruned partitions is a sound approach, however I'm starting to reconsider

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 Ah cripes. I committed something I didn't want to. I'm rebasing again in a few... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r82850487 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -225,13 +225,16 @@ case class FileSourceScanExec

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 BTW, I'm working on a rebase to fix merge conflicts and address reviewers' feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 I would be wary of amending our data sources to support case-insensitive field resolution. For one thing, strictly speaking it can lead to ambiguity in schema resolution. In the—potential

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r82713318 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -225,13 +225,16 @@ case class FileSourceScanExec

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r82713068 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -225,13 +225,16 @@ case class FileSourceScanExec

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r82712965 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -477,6 +478,15 @@ class InMemoryCatalog

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 I've rebased this PR and refactored it somewhat. The main change is to move the partition pruning logic from `FileSourceStrategy` into a Catalyst optimizer rule called `PruneFileSourcePartitions

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 > I think we can just remove the table schema reconciliation when converting hive parquet table to data source table, and fix the failed tests. @cloud-fan Sorry, I don't understand y

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 @cloud-fan Thanks for your careful analysis. I'm just getting back to work today from two weeks off and will reply as soon as I have some time. It may be a few days. Cheers. --- If your project

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-09-23 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 FYI I'll be mostly away the rest of this week and off the grid entirely next week. I've continued to work on this patch on my side. Like I wrote earlier, I've been awaiting the outcome

[GitHub] spark issue #14750: [SPARK-17183][SQL] put hive serde table schema to table ...

2016-09-15 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14750 @clockfly I can offer one answer to your question. One of the main benefits of this change is to allow us to remove the costly schema reconciliation between hive metastore schema and on-disk

[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-09-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14388 @viirya Any progress on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-09-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 Thanks @ericl and @davies for your latest feedback. I'd like to take the opportunity to rebase this PR off of #14750 after it's merged to master before pushing another commit, however I understand

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-09-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r77426981 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2531,6 +2531,8 @@ class Dataset[T] private[sql]( */ def

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-09-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r77426759 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -346,11 +340,30 @@ trait FileCatalog

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-09-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r77426633 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TableFileCatalog.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-09-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r77425878 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TableFileCatalog.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-09-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r77425776 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -79,8 +79,16 @@ object FileSourceStrategy

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-09-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r77423958 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -184,7 +184,7 @@ case class FileSourceScanExec

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-09-01 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 I've rebased and pushed a new commit which prune's a `HadoopFsRelation`'s `TableFileCatalog` to a `ListingFileCatalog` for a given set of partition pruning expressions. Another upside

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-09-01 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 Thanks, that looks great. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-08-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14388 @viirya I sent you an email with a link to a test file to your public github e-mail address. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-08-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 @cloud-fan Now that #14155 is merged, have you started on a follow up PR to address the column name case-insensitivity issue? I've rebased and done some more work on this PR, but I'd like

[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-08-30 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14388 @viirya I'll see what I can do. If nothing else, I may be able to share a private data file over S3 if you promise not to share it with anyone else. --- If your project is set up for it, you can

[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-08-29 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14388 @viirya If I do a simple `select` on an array field it works, but if I add an `order by` clause which orders by the array column I get exceptions like ``` 16/08/29 21:47:01 ERROR

[GitHub] spark issue #14811: [SPARK-17231][CORE] Avoid building debug or trace log me...

2016-08-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14811 Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14811: [SPARK-17231][CORE] Avoid building debug or trace...

2016-08-25 Thread mallman
Github user mallman closed the pull request at: https://github.com/apache/spark/pull/14811 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14798: [SPARK-17231][CORE] Avoid building debug or trace log me...

2016-08-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14798 @zsxwing PR #14811 is a backport of this PR to `branch-2.0`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #14811: [SPARK-17231][CORE] Avoid building debug or trace...

2016-08-25 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/14811 [SPARK-17231][CORE] Avoid building debug or trace log messages unless This is simply a backport of #14798 to `branch-2.0`. This backport omits the change to `ExternalShuffleBlockHandler.java

[GitHub] spark issue #14798: [SPARK-17231][CORE] Avoid building debug or trace log me...

2016-08-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14798 Will do --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14798: [SPARK-17231][CORE] Avoid building debug or trace log me...

2016-08-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14798 I focused mainly on trace and debug logging. I didn't do much with errors or warnings, especially where exceptions are logged. I'm assuming these are less frequent, and the cost of building those

[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2016-08-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14617 @jerryshao The UI changes look great. I have not had a chance to scrutinize the source changes. Hopefully we can get someone else to help review. --- If your project is set up for it, you can

[GitHub] spark pull request #14798: [SPARK-17231][CORE] Avoid building debug or trace...

2016-08-24 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/14798 [SPARK-17231][CORE] Avoid building debug or trace log messages unless the respective log level is enabled (This PR addresses https://issues.apache.org/jira/browse/SPARK-17231) ## What

[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...

2016-08-23 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r75904621 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -237,21 +237,26 @@ private[hive] class HiveMetastoreCatalog

[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2016-08-23 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14617 Hi @jerryshao. I think this is a great idea and fills in an important gap in the app's UI. Going by the screenshot you posted, instead of putting both on and off heap memory in a single column, how

[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...

2016-08-23 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r75884267 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -237,21 +237,26 @@ private[hive] class HiveMetastoreCatalog

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-22 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14537 @rajeshbalamohan So for Orc 2.x files, would schema inference be unnecessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-08-18 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 That looks great! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-08-18 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 @cloud-fan O... how exciting! Is there a PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-08-18 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 It may be, but we at least need to get the unit tests working. And they use mixed case column names. :) --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-08-18 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 @ericl That would be ideal, however the Hive metastore does not faithfully record column names with upper case characters. So if you save a parquet file with a column named `myCol` and define

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-08-18 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 I've been thinking some more about the metastore/file schema reconciliation. As I mentioned in the PR description, this patch omits this reconciliation. This causes failures when the parquet files

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-08-18 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 @cloud-fan Actually, all hive table partition metadata are retrieved for every query analysis. This is then used to compare the new metadata with the cached metadata. If they're the same, then we

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-08-17 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/14690 [SPARK-16980][SQL] Load only catalog table partition metadata required to answer a query (This PR addresses https://issues.apache.org/jira/browse/SPARK-16980.) (N.B. I'm submitting

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14537 @rajeshbalamohan We'll need a committer to review your patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #14551: [SPARK-16961][CORE] Fixed off-by-one error that b...

2016-08-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14551#discussion_r74276138 --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala --- @@ -874,4 +874,38 @@ class UtilsSuite extends SparkFunSuite

[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...

2016-08-09 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r74092780 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -287,14 +287,14 @@ private[hive] class HiveMetastoreCatalog

[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...

2016-08-09 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r74092457 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -287,14 +287,14 @@ private[hive] class HiveMetastoreCatalog

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14537 @rajeshbalamohan, the changes to `HiveMetastoreCatalog.scala` look reasonable. This mirrors the behavior of this method before the `if (fileType.equals("parquet"))` expression was

[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...

2016-08-09 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r74003788 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -294,7 +294,9 @@ private[hive] class HiveMetastoreCatalog

[GitHub] spark pull request #13818: [SPARK-15968][SQL] Nonempty partitioned metastore...

2016-08-08 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/13818#discussion_r73892580 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -298,6 +298,7 @@ case class InsertIntoHiveTable

[GitHub] spark issue #14064: [SPARK-15968][SQL] Nonempty partitioned metastore tables...

2016-07-06 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14064 @cloud-fan Muchas gracias! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

<    1   2   3   4   5   6   7   >