[GitHub] spark pull request #20116: [SPARK-20960][SQL] make ColumnVector public

gatorsmile Wed, 03 Jan 2018 08:43:41 -0800

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20116#discussion_r159470325
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -14,32 +14,39 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -package org.apache.spark.sql.execution.vectorized;
    +package org.apache.spark.sql.vectorized;
     
     import org.apache.spark.sql.catalyst.util.MapData;
     import org.apache.spark.sql.types.DataType;
     import org.apache.spark.sql.types.Decimal;
     import org.apache.spark.unsafe.types.UTF8String;
     
     /**
    - * This class represents in-memory values of a column and provides the 
main APIs to access the data.
    - * It supports all the types and contains get APIs as well as their 
batched versions. The batched
    - * versions are considered to be faster and preferable whenever possible.
    + * An interface representing in-memory columnar data in Spark. This 
interface defines the main APIs
    + * to access the data, as well as their batched versions. The batched 
versions are considered to be
    + * faster and preferable whenever possible.
      *
    - * To handle nested schemas, ColumnVector has two types: Arrays and 
Structs. In both cases these
    - * columns have child columns. All of the data are stored in the child 
columns and the parent column
    - * only contains nullability. In the case of Arrays, the lengths and 
offsets are saved in the child
    - * column and are encoded identically to INTs.
    + * Most of the APIs take the rowId as a parameter. This is the batch local 
0-based row id for values
    + * in this ColumnVector.
      *
    - * Maps are just a special case of a two field struct.
    + * ColumnVector supports all the data types including nested types. To 
handle nested types,
    + * ColumnVector can have children and is a tree structure. For struct 
type, it stores the actual
    + * data of each field in the corresponding child ColumnVector, and only 
store null information in
    --- End diff --
    
    `store ` -> `stores`



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20116: [SPARK-20960][SQL] make ColumnVector public

Reply via email to