[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684188#comment-17684188 ]
ASF GitHub Bot commented on PARQUET-2159: ----------------------------------------- jatin-bhateja commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1048037258 ########## parquet-generator/src/main/resources/ByteBitPackingVectorLE: ########## @@ -0,0 +1,3218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.column.values.bitpacking; + +import jdk.incubator.vector.*; + +import java.nio.ByteBuffer; + +/** + * This is an auto-generated source file and should not edit it directly. + */ +public abstract class ByteBitPackingVectorLE { + private static final BytePacker[] packers = new BytePacker[33]; + + static { + packers[0] = new Packer0(); + packers[1] = new Packer1(); + packers[2] = new Packer2(); + packers[3] = new Packer3(); + packers[4] = new Packer4(); + packers[5] = new Packer5(); + packers[6] = new Packer6(); + packers[7] = new Packer7(); + packers[8] = new Packer8(); + packers[9] = new Packer9(); + packers[10] = new Packer10(); + packers[11] = new Packer11(); + packers[12] = new Packer12(); + packers[13] = new Packer13(); + packers[14] = new Packer14(); + packers[15] = new Packer15(); + packers[16] = new Packer16(); + packers[17] = new Packer17(); + packers[18] = new Packer18(); + packers[19] = new Packer19(); + packers[20] = new Packer20(); + packers[21] = new Packer21(); + packers[22] = new Packer22(); + packers[23] = new Packer23(); + packers[24] = new Packer24(); + packers[25] = new Packer25(); + packers[26] = new Packer26(); + packers[27] = new Packer27(); + packers[28] = new Packer28(); + packers[29] = new Packer29(); + packers[30] = new Packer30(); + packers[31] = new Packer31(); + packers[32] = new Packer32(); + } + + public static final BytePackerFactory factory = new BytePackerFactory() { + public BytePacker newBytePacker(int bitWidth) { + return packers[bitWidth]; + } + }; + + private static final class Packer0 extends BytePacker { + private int unpackCount = 0; + + private Packer0() { + super(0); + } + + public int getUnpackCount() { + return unpackCount; + } + + public final void pack8Values(final int[] in, final int inPos, final byte[] out, final int outPos) { + } + + public final void pack32Values(final int[] in, final int inPos, final byte[] out, final int outPos) { + } + + public final void unpack8Values(final byte[] in, final int inPos, final int[] out, final int outPos) { + } + + public final void unpack8Values(final ByteBuffer in, final int inPos, final int[] out, final int outPos) { + } + + public final void unpack32Values(final byte[] in, final int inPos, final int[] out, final int outPos) { + } + + public final void unpack32Values(final ByteBuffer in, final int inPos, final int[] out, final int outPos) { + } + + public final void unpackValuesVector(final byte[] input, final int inPos, final int[] output, final int outPos) { + } + + public final void unpackValuesVector(final ByteBuffer input, final int inPos, final int[] output, final int outPos) { Review Comment: All these empty definitions can be removed if we introduce a new class ByteVectorPacker which inherit from existing BytePacker. > Parquet bit-packing de/encode optimization > ------------------------------------------ > > Key: PARQUET-2159 > URL: https://issues.apache.org/jira/browse/PARQUET-2159 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Affects Versions: 1.13.0 > Reporter: Fang-Xie > Assignee: Fang-Xie > Priority: Major > Fix For: 1.13.0 > > Attachments: image-2022-06-15-22-56-08-396.png, > image-2022-06-15-22-57-15-964.png, image-2022-06-15-22-58-01-442.png, > image-2022-06-15-22-58-40-704.png > > > Current Spark use Parquet-mr as parquet reader/writer library, but the > built-in bit-packing en/decode is not efficient enough. > Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector > in Open JDK18 brings prominent performance improvement. > Due to Vector API is added to OpenJDK since 16, So this optimization request > JDK16 or higher. > *Below are our test results* > Functional test is based on open-source parquet-mr Bit-pack decoding > function: *_public final void unpack8Values(final byte[] in, final int inPos, > final int[] out, final int outPos)_* __ > compared with our implementation with vector API *_public final void > unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final > int outPos)_* > We tested 10 pairs (open source parquet bit unpacking vs ours optimized > vectorized SIMD implementation) decode function with bit > width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > !image-2022-06-15-22-56-08-396.png|width=437,height=223! > We integrated our bit-packing decode implementation into parquet-mr, tested > the parquet batch reader ability from Spark VectorizedParquetRecordReader > which get parquet column data by the batch way. We construct parquet file > with different row count and column count, the column data type is Int32, the > maximum int value is 127 which satisfies bit pack encode with bit width=7, > the count of the row is from 10k to 100 million and the count of the column > is from 1 to 4. > !image-2022-06-15-22-57-15-964.png|width=453,height=229! > !image-2022-06-15-22-58-01-442.png|width=439,height=217! > !image-2022-06-15-22-58-40-704.png|width=415,height=208! -- This message was sent by Atlassian Jira (v8.20.10#820010)