[ https://issues.apache.org/jira/browse/SPARK-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315357#comment-15315357 ]
Xiao Li commented on SPARK-15691: --------------------------------- Finally, {{HiveMetastoreCatalog}} has been cleaned in my private local branch. Now, it becomes a pure Data Source Table cache. The LOC is reduced to 90. We need a new name! The four Hive-specific {{Analyzer}} rules are moved to {{HiveStrategies.scala}}. This is just like {{DataSourceStrategy.scala}}, which has both {{Anazlyer}} rules and {{SparkPlanner}} strategies. Also, combined the duplicate code and refactored a few code. Will write and upload a design doc tomorrow to document the change details for further review. Thanks! > Refactor and improve Hive support > --------------------------------- > > Key: SPARK-15691 > URL: https://issues.apache.org/jira/browse/SPARK-15691 > Project: Spark > Issue Type: New Feature > Components: SQL > Reporter: Reynold Xin > > Hive support is important to Spark SQL, as many Spark users use it to read > from Hive. The current architecture is very difficult to maintain, and this > ticket tracks progress towards getting us to a sane state. > A number of things we want to accomplish are: > - Move the Hive specific catalog logic into HiveExternalCatalog. > -- Remove HiveSessionCatalog. All Hive-related stuff should go into > HiveExternalCatalog. This would require moving caching either into > HiveExternalCatalog, or just into SessionCatalog. > -- Move using properties to store data source options into > HiveExternalCatalog. > -- Potentially more. > - Remove HIve's specific ScriptTransform implementation and make it more > general so we can put it in sql/core. > - Implement HiveTableScan (and write path) as a data source, so we don't need > a special planner rule for HiveTableScan. > - Remove HiveSharedState and HiveSessionState. > One thing that is still unclear to me is how to work with Hive UDF support. > We might still need a special planner rule there. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org