Github user mallman commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22905#discussion_r230128199
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala
 ---
    @@ -0,0 +1,32 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.internal.SQLConf
    +import org.apache.spark.sql.types.StructType
    +
    +/**
    + * An optional mix-in for columnar [[FileFormat]]s. This trait provides 
some helpful metadata when
    + * debugging a physical query plan.
    + */
    +private[sql] trait ColumnarFileFormat {
    --- End diff --
    
    This is not meant to be exposed as an external interface for outside data 
sources. In fact, in making it private I intended to create something that's 
intentionally hidden. That it could be used more generally is possible, but I'm 
just looking for an abstraction with the lightest footprint that will allow its 
use in `FileSourceScanExec` without referencing `ParquetFileFormat` there. To 
reference a specific file format in `FileSourceScanExec` seems totally 
inappropriate.
    
    I'm hoping @dongjoon-hyun can offer his opinion on whether this can be 
generalized to the ORC file format.
    
    I understand that I'm making an assumption about our ability for other file 
formats to adopt this interface. Another purpose in making this interface 
private is that it makes it easy to modify to support other implementations if 
necessary.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to