[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

j-baker Tue, 12 Sep 2017 01:24:29 -0700

Github user j-baker commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19136#discussion_r138281654
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ---
    @@ -0,0 +1,95 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.v2
    +
    +import org.apache.spark.sql.Strategy
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.planning.PhysicalOperation
    +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
    +import org.apache.spark.sql.execution.{FilterExec, ProjectExec, SparkPlan}
    +import org.apache.spark.sql.execution.datasources.DataSourceStrategy
    +import org.apache.spark.sql.sources.Filter
    +import 
org.apache.spark.sql.sources.v2.reader.downward.{CatalystFilterPushDownSupport, 
ColumnPruningSupport, FilterPushDownSupport}
    +
    +object DataSourceV2Strategy extends Strategy {
    +  // TODO: write path
    +  override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    +    case PhysicalOperation(projects, filters, DataSourceV2Relation(output, 
reader)) =>
    +      val attrMap = AttributeMap(output.zip(output))
    +
    +      val projectSet = AttributeSet(projects.flatMap(_.references))
    +      val filterSet = AttributeSet(filters.flatMap(_.references))
    +
    +      // Match original case of attributes.
    +      // TODO: nested fields pruning
    +      val requiredColumns = (projectSet ++ filterSet).toSeq.map(attrMap)
    +      reader match {
    +        case r: ColumnPruningSupport =>
    +          r.pruneColumns(requiredColumns.toStructType)
    +        case _ =>
    +      }
    +
    +      val stayUpFilters: Seq[Expression] = reader match {
    +        case r: CatalystFilterPushDownSupport =>
    +          r.pushCatalystFilters(filters.toArray)
    +
    +        case r: FilterPushDownSupport =>
    --- End diff --
    
    Considering that there is a translation between Catalyst filters and 
Filters, it's probably worth _just_ doing the catalyst one, and providing the 
user with the translator if they want to do the Filter approach?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

Reply via email to