[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala closed the pull request at: https://github.com/apache/carbondata/pull/2819 ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228050305 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/BlockletScannedResult.java --- @@ -72,6 +72,11 @@ */ private int[] pageFilteredRowCount; + /** + * Filtered pages to be decoded and loaded to vector. + */ + private int[] pagesFiltered; --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228050191 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java --- @@ -124,6 +124,11 @@ private boolean preFetchData = true; + /** + * It fills the vector directly from decoded column page with out any staging and conversions --- End diff -- ok, added comment ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228049239 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/AbstractQueryExecutor.java --- @@ -478,6 +478,17 @@ private BlockExecutionInfo getBlockExecutionInfoForBlock(QueryModel queryModel, } else { blockExecutionInfo.setPrefetchBlocklet(queryModel.isPreFetchData()); } +// In case of fg datamap it should not go to direct fill. +boolean fgDataMapPathPresent = false; +for (TableBlockInfo blockInfo : queryModel.getTableBlockInfos()) { + fgDataMapPathPresent = blockInfo.getDataMapWriterPath() != null; + if (fgDataMapPathPresent) { +break; --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228048117 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedVectorResultCollector.java --- @@ -198,4 +219,48 @@ void fillColumnVectorDetails(CarbonColumnarBatch columnarBatch, int rowCounter, } } + private void collectResultInColumnarBatchDirect(BlockletScannedResult scannedResult, --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228047825 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveIntegralCodec.java --- @@ -248,6 +269,143 @@ public double decodeDouble(float value) { public double decodeDouble(double value) { throw new RuntimeException("internal error: " + debugInfo()); } + +@Override +public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) { + CarbonColumnVector vector = vectorInfo.vector; + BitSet nullBits = columnPage.getNullBits(); + DataType vectorDataType = vector.getType(); + DataType pageDataType = columnPage.getDataType(); + int pageSize = columnPage.getPageSize(); + BitSet deletedRows = vectorInfo.deletedRows; + fillVector(columnPage, vector, vectorDataType, pageDataType, pageSize, vectorInfo); + if (deletedRows == null || deletedRows.isEmpty()) { +for (int i = nullBits.nextSetBit(0); i >= 0; i = nullBits.nextSetBit(i + 1)) { + vector.putNull(i); +} + } +} + +private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, +DataType vectorDataType, DataType pageDataType, int pageSize, ColumnVectorInfo vectorInfo) { + if (pageDataType == DataTypes.BOOLEAN || pageDataType == DataTypes.BYTE) { +byte[] byteData = columnPage.getBytePage(); +if (vectorDataType == DataTypes.SHORT) { + for (int i = 0; i < pageSize; i++) { +vector.putShort(i, (short) byteData[i]); + } +} else if (vectorDataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { +vector.putInt(i, (int) byteData[i]); + } +} else if (vectorDataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, byteData[i]); + } +} else if (vectorDataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, byteData[i] * 1000); + } +} else if (vectorDataType == DataTypes.BOOLEAN) { + vector.putBytes(0, pageSize, byteData, 0); + --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228047744 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java --- @@ -29,6 +31,12 @@ */ ColumnPage decode(byte[] input, int offset, int length) throws MemoryException, IOException; + /** + * Apply decoding algorithm on input byte array and fill the vector here. + */ + void decodeAndFillVector(byte[] input, int offset, int length, ColumnVectorInfo vectorInfo, + BitSet nullBits, boolean isLVEncoded) throws MemoryException, IOException; --- End diff -- Yes, it is as per the old method `decode` added this method, It was added as part of the local dictionary and it is getting refactored as part of vishal's store method. @kumarvishal09 please remove this `isLVEncoded` from decode method. And it should align one of the datatypes used in the column page in case of local dictionary. ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228042421 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/EncodingFactory.java --- @@ -66,6 +66,14 @@ public abstract ColumnPageEncoder createEncoder(TableSpec.ColumnSpec columnSpec, */ public ColumnPageDecoder createDecoder(List encodings, List encoderMetas, String compressor) throws IOException { +return createDecoder(encodings, encoderMetas, compressor, false); + } + + /** + * Return new decoder based on encoder metadata read from file --- End diff -- added comment ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228042373 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/EncodingFactory.java --- @@ -66,6 +66,14 @@ public abstract ColumnPageEncoder createEncoder(TableSpec.ColumnSpec columnSpec, */ public ColumnPageDecoder createDecoder(List encodings, List encoderMetas, String compressor) throws IOException { +return createDecoder(encodings, encoderMetas, compressor, false); + } + + /** + * Return new decoder based on encoder metadata read from file --- End diff -- added comment ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228041948 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java --- @@ -176,7 +179,7 @@ private static ColumnPage getDecimalColumnPage(TableSpec.ColumnSpec columnSpec, rowOffset.putInt(counter, offset); VarLengthColumnPageBase page; -if (unsafe) { +if (unsafe && !meta.isFillCompleteVector()) { --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228041838 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java --- @@ -176,7 +179,7 @@ private static ColumnPage getDecimalColumnPage(TableSpec.ColumnSpec columnSpec, rowOffset.putInt(counter, offset); VarLengthColumnPageBase page; -if (unsafe) { +if (unsafe && !meta.isFillCompleteVector()) { --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228040371 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageEncoderMeta.java --- @@ -49,6 +49,8 @@ // Make it protected for RLEEncoderMeta protected String compressorName; + private transient boolean fillCompleteVector; --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228040196 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/SafeDecimalColumnPage.java --- @@ -193,6 +193,30 @@ public void convertValue(ColumnPageValueConverter codec) { } } + @Override public byte[] getBytePage() { --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228040088 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,278 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { +this.lengthSize = lengthSize; +this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); + + public int getLengthFromBuffer(ByteBuffer buffer) { +return buffer.getShort(); + } +} + +class NonDictionaryVectorFillerFactory { + + public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize, + int numberOfRows) { +if (type == DataTypes.STRING) { + return new StringVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.VARCHAR) { + return new LongStringVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.TIMESTAMP) { + return new TimeStampVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.BOOLEAN) { + return new BooleanVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.SHORT) { + return new ShortVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.INT) { + return new IntVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.LONG) { + return new LongVectorFiller(lengthSize, numberOfRows); +} else { + throw new UnsupportedOperationException("Not supported datatype : " + type); +} + + } + +} + +class StringVectorFiller extends AbstractNonDictionaryVectorFiller { + + public StringVectorFiller(int lengthSize, int numberOfRows) { +super(lengthSize, numberOfRows); + } + + @Override + public void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer) { +// start position will be used to store the current data position +int startOffset = 0; +// as first position will be start from length of bytes as data is stored first in the memory +// block we need to skip first two bytes this is because first two bytes will be length of the +// data which we have to skip +int currentOffset = lengthSize; +ByteUtil.UnsafeComparer comparator = ByteUtil.UnsafeComparer.INSTANCE; +for (int i = 0; i < numberOfRows - 1; i++) { + buffer.position(startOffset); + startOffset += getLengthFromBuffer(buffer) + lengthSize; + int length = startOffset - (currentOffset); + if (comparator.equals(CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY, 0, + CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY.length, data, currentOffset, length)) { +vector.putNull(i); + } else { +vector.putByteArray(i, currentOffset, length, data); + } + currentOffset = startOffset + lengthSize; +} +// Handle last row +int length = (data.length - currentOffset); +if (comparator.equals(CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY, 0, +
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228039991 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,278 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228039614 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1845,6 +1845,18 @@ public static final int CARBON_MINMAX_ALLOWED_BYTE_COUNT_MIN = 10; public static final int CARBON_MINMAX_ALLOWED_BYTE_COUNT_MAX = 1000; + /** + * When enabled complete row filters will be handled by carbon in case of vector. + * If it is disabled then only page level pruning will be done by carbon and row level filtering + * will be done by spark for vector. + * There is no change in flow for non-vector based queries. --- End diff -- will make it as false by default in other pending PR. Since this PR is focused only on full scan many tests fail that's why it defaults it to true. ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228039285 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/VariableLengthDimensionColumnPage.java --- @@ -54,10 +75,15 @@ public VariableLengthDimensionColumnPage(byte[] dataChunks, int[] invertedIndex, } dataChunkStore = DimensionChunkStoreFactory.INSTANCE .getDimensionChunkStore(0, isExplicitSorted, numberOfRows, totalSize, dimStoreType, -dictionary); -dataChunkStore.putArray(invertedIndex, invertedIndexReverse, dataChunks); +dictionary, vectorInfo != null); +if (vectorInfo != null) { + dataChunkStore.fillVector(invertedIndex, invertedIndexReverse, dataChunks, vectorInfo); +} else { + dataChunkStore.putArray(invertedIndex, invertedIndexReverse, dataChunks); +} } + --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r228038986 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/MeasureRawColumnChunk.java --- @@ -105,6 +106,22 @@ public ColumnPage convertToColumnPageWithOutCache(int index) { } } + /** + * Convert raw data with specified page number processed to DimensionColumnDataChunk and fill the + * vector + * + * @param pageNumber page number to decode and fill the vector + * @param vectorInfo vector to be filled with column page + */ + public void convertToColumnPageAndFillVector(int pageNumber, ColumnVectorInfo vectorInfo) { +assert pageNumber < pagesCount; +try { + chunkReader.decodeColumnPageAndFillVector(this, pageNumber, vectorInfo); +} catch (IOException | MemoryException e) { + throw new RuntimeException(e); --- End diff -- Because those are checked exceptions, need to handle and throw the same exceptions till callers ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227619567 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java --- @@ -124,6 +124,11 @@ private boolean preFetchData = true; + /** + * It fills the vector directly from decoded column page with out any staging and conversions --- End diff -- "It fills the vector", can you give more detail for which vector? and describe how spark/presto is integrated with this? ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227619641 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/BlockletScannedResult.java --- @@ -72,6 +72,11 @@ */ private int[] pageFilteredRowCount; + /** + * Filtered pages to be decoded and loaded to vector. + */ + private int[] pagesFiltered; --- End diff -- ```suggestion private int[] pagesIdFiltered; ``` ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227619247 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/AbstractQueryExecutor.java --- @@ -478,6 +478,17 @@ private BlockExecutionInfo getBlockExecutionInfoForBlock(QueryModel queryModel, } else { blockExecutionInfo.setPrefetchBlocklet(queryModel.isPreFetchData()); } +// In case of fg datamap it should not go to direct fill. +boolean fgDataMapPathPresent = false; +for (TableBlockInfo blockInfo : queryModel.getTableBlockInfos()) { + fgDataMapPathPresent = blockInfo.getDataMapWriterPath() != null; + if (fgDataMapPathPresent) { +break; --- End diff -- Is it possible to set the queryModel.setDirectVectorFill directly? ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227619046 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedVectorResultCollector.java --- @@ -198,4 +219,48 @@ void fillColumnVectorDetails(CarbonColumnarBatch columnarBatch, int rowCounter, } } + private void collectResultInColumnarBatchDirect(BlockletScannedResult scannedResult, --- End diff -- add comment for this function ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227618801 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveIntegralCodec.java --- @@ -248,6 +269,143 @@ public double decodeDouble(float value) { public double decodeDouble(double value) { throw new RuntimeException("internal error: " + debugInfo()); } + +@Override +public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) { + CarbonColumnVector vector = vectorInfo.vector; + BitSet nullBits = columnPage.getNullBits(); + DataType vectorDataType = vector.getType(); + DataType pageDataType = columnPage.getDataType(); + int pageSize = columnPage.getPageSize(); + BitSet deletedRows = vectorInfo.deletedRows; + fillVector(columnPage, vector, vectorDataType, pageDataType, pageSize, vectorInfo); + if (deletedRows == null || deletedRows.isEmpty()) { +for (int i = nullBits.nextSetBit(0); i >= 0; i = nullBits.nextSetBit(i + 1)) { + vector.putNull(i); +} + } +} + +private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, +DataType vectorDataType, DataType pageDataType, int pageSize, ColumnVectorInfo vectorInfo) { + if (pageDataType == DataTypes.BOOLEAN || pageDataType == DataTypes.BYTE) { +byte[] byteData = columnPage.getBytePage(); +if (vectorDataType == DataTypes.SHORT) { + for (int i = 0; i < pageSize; i++) { +vector.putShort(i, (short) byteData[i]); + } +} else if (vectorDataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { +vector.putInt(i, (int) byteData[i]); + } +} else if (vectorDataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, byteData[i]); + } +} else if (vectorDataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, byteData[i] * 1000); + } +} else if (vectorDataType == DataTypes.BOOLEAN) { + vector.putBytes(0, pageSize, byteData, 0); + --- End diff -- remove empty line ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227618507 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java --- @@ -29,6 +31,12 @@ */ ColumnPage decode(byte[] input, int offset, int length) throws MemoryException, IOException; + /** + * Apply decoding algorithm on input byte array and fill the vector here. + */ + void decodeAndFillVector(byte[] input, int offset, int length, ColumnVectorInfo vectorInfo, + BitSet nullBits, boolean isLVEncoded) throws MemoryException, IOException; --- End diff -- I feel it is not good to add `isLVEncoded` just for LVEncoded, can we pass a more generic parameter, since this is a common class for all Decoder ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227618222 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/EncodingFactory.java --- @@ -66,6 +66,14 @@ public abstract ColumnPageEncoder createEncoder(TableSpec.ColumnSpec columnSpec, */ public ColumnPageDecoder createDecoder(List encodings, List encoderMetas, String compressor) throws IOException { +return createDecoder(encodings, encoderMetas, compressor, false); + } + + /** + * Return new decoder based on encoder metadata read from file --- End diff -- In the comment, can you describe what is the behavior when `fullVectorFill` is true? ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227618184 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/EncodingFactory.java --- @@ -66,6 +66,14 @@ public abstract ColumnPageEncoder createEncoder(TableSpec.ColumnSpec columnSpec, */ public ColumnPageDecoder createDecoder(List encodings, List encoderMetas, String compressor) throws IOException { +return createDecoder(encodings, encoderMetas, compressor, false); + } + + /** + * Return new decoder based on encoder metadata read from file --- End diff -- In the comment, can you describe what is the behavior when `fullVectorFill` is true? ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227617936 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageEncoderMeta.java --- @@ -49,6 +49,8 @@ // Make it protected for RLEEncoderMeta protected String compressorName; + private transient boolean fillCompleteVector; --- End diff -- add comment for this variable ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227618017 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java --- @@ -176,7 +179,7 @@ private static ColumnPage getDecimalColumnPage(TableSpec.ColumnSpec columnSpec, rowOffset.putInt(counter, offset); VarLengthColumnPageBase page; -if (unsafe) { +if (unsafe && !meta.isFillCompleteVector()) { --- End diff -- many place check like this, can we make a function for it and make it more readable by give proper function name? ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227617725 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/SafeDecimalColumnPage.java --- @@ -193,6 +193,30 @@ public void convertValue(ColumnPageValueConverter codec) { } } + @Override public byte[] getBytePage() { --- End diff -- move Override to previous line ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227617413 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,278 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { +this.lengthSize = lengthSize; +this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); + + public int getLengthFromBuffer(ByteBuffer buffer) { +return buffer.getShort(); + } +} + +class NonDictionaryVectorFillerFactory { + + public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize, + int numberOfRows) { +if (type == DataTypes.STRING) { + return new StringVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.VARCHAR) { + return new LongStringVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.TIMESTAMP) { + return new TimeStampVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.BOOLEAN) { + return new BooleanVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.SHORT) { + return new ShortVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.INT) { + return new IntVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.LONG) { + return new LongVectorFiller(lengthSize, numberOfRows); +} else { + throw new UnsupportedOperationException("Not supported datatype : " + type); +} + + } + +} + +class StringVectorFiller extends AbstractNonDictionaryVectorFiller { + + public StringVectorFiller(int lengthSize, int numberOfRows) { +super(lengthSize, numberOfRows); + } + + @Override + public void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer) { +// start position will be used to store the current data position +int startOffset = 0; +// as first position will be start from length of bytes as data is stored first in the memory +// block we need to skip first two bytes this is because first two bytes will be length of the +// data which we have to skip +int currentOffset = lengthSize; +ByteUtil.UnsafeComparer comparator = ByteUtil.UnsafeComparer.INSTANCE; +for (int i = 0; i < numberOfRows - 1; i++) { + buffer.position(startOffset); + startOffset += getLengthFromBuffer(buffer) + lengthSize; + int length = startOffset - (currentOffset); + if (comparator.equals(CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY, 0, + CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY.length, data, currentOffset, length)) { +vector.putNull(i); + } else { +vector.putByteArray(i, currentOffset, length, data); + } + currentOffset = startOffset + lengthSize; +} +// Handle last row +int length = (data.length - currentOffset); +if (comparator.equals(CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY, 0, +
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227617094 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,278 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { --- End diff -- For public class, please add interface annotation ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227617004 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1845,6 +1845,18 @@ public static final int CARBON_MINMAX_ALLOWED_BYTE_COUNT_MIN = 10; public static final int CARBON_MINMAX_ALLOWED_BYTE_COUNT_MAX = 1000; + /** + * When enabled complete row filters will be handled by carbon in case of vector. + * If it is disabled then only page level pruning will be done by carbon and row level filtering + * will be done by spark for vector. + * There is no change in flow for non-vector based queries. --- End diff -- can you also add in which case it is suggested to set to false? since default is true ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227616799 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/VariableLengthDimensionColumnPage.java --- @@ -54,10 +75,15 @@ public VariableLengthDimensionColumnPage(byte[] dataChunks, int[] invertedIndex, } dataChunkStore = DimensionChunkStoreFactory.INSTANCE .getDimensionChunkStore(0, isExplicitSorted, numberOfRows, totalSize, dimStoreType, -dictionary); -dataChunkStore.putArray(invertedIndex, invertedIndexReverse, dataChunks); +dictionary, vectorInfo != null); +if (vectorInfo != null) { + dataChunkStore.fillVector(invertedIndex, invertedIndexReverse, dataChunks, vectorInfo); +} else { + dataChunkStore.putArray(invertedIndex, invertedIndexReverse, dataChunks); +} } + --- End diff -- remove this ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227616722 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/MeasureRawColumnChunk.java --- @@ -105,6 +106,22 @@ public ColumnPage convertToColumnPageWithOutCache(int index) { } } + /** + * Convert raw data with specified page number processed to DimensionColumnDataChunk and fill the + * vector + * + * @param pageNumber page number to decode and fill the vector + * @param vectorInfo vector to be filled with column page + */ + public void convertToColumnPageAndFillVector(int pageNumber, ColumnVectorInfo vectorInfo) { +assert pageNumber < pagesCount; +try { + chunkReader.decodeColumnPageAndFillVector(this, pageNumber, vectorInfo); +} catch (IOException | MemoryException e) { + throw new RuntimeException(e); --- End diff -- Why not throw e directly? ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227322577 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveIntegralCodec.java --- @@ -248,6 +266,136 @@ public double decodeDouble(float value) { public double decodeDouble(double value) { throw new RuntimeException("internal error: " + debugInfo()); } + +@Override +public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) { + CarbonColumnVector vector = vectorInfo.vector; + BitSet nullBits = columnPage.getNullBits(); + DataType dataType = vector.getType(); + DataType type = columnPage.getDataType(); + int pageSize = columnPage.getPageSize(); + BitSet deletedRows = vectorInfo.deletedRows; + fillVector(columnPage, vector, dataType, type, pageSize, vectorInfo); + if (deletedRows == null || deletedRows.isEmpty()) { +for (int i = nullBits.nextSetBit(0); i >= 0; i = nullBits.nextSetBit(i + 1)) { + vector.putNull(i); +} + } +} + +private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, DataType dataType, --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227027472 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveIntegralCodec.java --- @@ -248,6 +266,136 @@ public double decodeDouble(float value) { public double decodeDouble(double value) { throw new RuntimeException("internal error: " + debugInfo()); } + +@Override +public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) { + CarbonColumnVector vector = vectorInfo.vector; + BitSet nullBits = columnPage.getNullBits(); + DataType dataType = vector.getType(); + DataType type = columnPage.getDataType(); + int pageSize = columnPage.getPageSize(); + BitSet deletedRows = vectorInfo.deletedRows; + fillVector(columnPage, vector, dataType, type, pageSize, vectorInfo); + if (deletedRows == null || deletedRows.isEmpty()) { +for (int i = nullBits.nextSetBit(0); i >= 0; i = nullBits.nextSetBit(i + 1)) { + vector.putNull(i); +} + } +} + +private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, DataType dataType, --- End diff -- For Timestamp type `vector.putLong(i, byteData[i] * 1000);` should be changed to `vector.putLong(i, (long) byteData[i] * 1000L);` otherwise it would cross integer range and give wrong results. Please handle the same for AdaptiveDeltaIntegralCodec ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863664 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java --- @@ -633,6 +622,56 @@ public boolean getBoolean(int rowId) { */ public abstract double getDouble(int rowId); + + + + + /** + * Get byte value at rowId + */ + public abstract byte[] getByteData(); --- End diff -- removed ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863659 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java --- @@ -221,49 +229,66 @@ protected DimensionRawColumnChunk getDimensionRawColumnChunk(FileReader fileRead int offset = (int) rawColumnPage.getOffSet() + dimensionChunksLength .get(rawColumnPage.getColumnIndex()) + dataChunk3.getPage_offset().get(pageNumber); // first read the data and uncompressed it -return decodeDimension(rawColumnPage, rawData, pageMetadata, offset); +return decodeDimension(rawColumnPage, rawData, pageMetadata, offset, vectorInfo); + } + + @Override + public void decodeColumnPageAndFillVector(DimensionRawColumnChunk dimensionRawColumnChunk, + int pageNumber, ColumnVectorInfo vectorInfo) throws IOException, MemoryException { +DimensionColumnPage columnPage = +decodeColumnPage(dimensionRawColumnChunk, pageNumber, vectorInfo); +columnPage.freeMemory(); } - private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, - ByteBuffer pageData, int offset, boolean isLocalDictEncodedPage) + private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, ByteBuffer pageData, int offset, + boolean isLocalDictEncodedPage, ColumnVectorInfo vectorInfo, BitSet nullBitSet) throws IOException, MemoryException { List encodings = pageMetadata.getEncoders(); List encoderMetas = pageMetadata.getEncoder_meta(); String compressorName = CarbonMetadataUtil.getCompressorNameFromChunkMeta( pageMetadata.getChunk_meta()); ColumnPageDecoder decoder = encodingFactory.createDecoder(encodings, encoderMetas, -compressorName); -return decoder -.decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage); +compressorName, vectorInfo != null); +if (vectorInfo != null) { + return decoder + .decodeAndFillVector(pageData.array(), offset, pageMetadata.data_page_length, vectorInfo, + nullBitSet, isLocalDictEncodedPage); +} else { + return decoder + .decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage); +} } protected DimensionColumnPage decodeDimension(DimensionRawColumnChunk rawColumnPage, - ByteBuffer pageData, DataChunk2 pageMetadata, int offset) + ByteBuffer pageData, DataChunk2 pageMetadata, int offset, ColumnVectorInfo vectorInfo) throws IOException, MemoryException { List encodings = pageMetadata.getEncoders(); if (CarbonUtil.isEncodedWithMeta(encodings)) { - ColumnPage decodedPage = decodeDimensionByMeta(pageMetadata, pageData, offset, - null != rawColumnPage.getLocalDictionary()); - decodedPage.setNullBits(QueryUtil.getNullBitSet(pageMetadata.presence, this.compressor)); int[] invertedIndexes = new int[0]; int[] invertedIndexesReverse = new int[0]; // in case of no dictionary measure data types, if it is included in sort columns // then inverted index to be uncompressed + boolean isExplicitSorted = + CarbonUtil.hasEncoding(pageMetadata.encoders, Encoding.INVERTED_INDEX); + int dataOffset = offset; if (encodings.contains(Encoding.INVERTED_INDEX)) { offset += pageMetadata.data_page_length; -if (CarbonUtil.hasEncoding(pageMetadata.encoders, Encoding.INVERTED_INDEX)) { +if (isExplicitSorted) { --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863656 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java --- @@ -221,49 +229,66 @@ protected DimensionRawColumnChunk getDimensionRawColumnChunk(FileReader fileRead int offset = (int) rawColumnPage.getOffSet() + dimensionChunksLength .get(rawColumnPage.getColumnIndex()) + dataChunk3.getPage_offset().get(pageNumber); // first read the data and uncompressed it -return decodeDimension(rawColumnPage, rawData, pageMetadata, offset); +return decodeDimension(rawColumnPage, rawData, pageMetadata, offset, vectorInfo); + } + + @Override + public void decodeColumnPageAndFillVector(DimensionRawColumnChunk dimensionRawColumnChunk, + int pageNumber, ColumnVectorInfo vectorInfo) throws IOException, MemoryException { +DimensionColumnPage columnPage = +decodeColumnPage(dimensionRawColumnChunk, pageNumber, vectorInfo); +columnPage.freeMemory(); } - private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, - ByteBuffer pageData, int offset, boolean isLocalDictEncodedPage) + private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, ByteBuffer pageData, int offset, + boolean isLocalDictEncodedPage, ColumnVectorInfo vectorInfo, BitSet nullBitSet) throws IOException, MemoryException { List encodings = pageMetadata.getEncoders(); List encoderMetas = pageMetadata.getEncoder_meta(); String compressorName = CarbonMetadataUtil.getCompressorNameFromChunkMeta( pageMetadata.getChunk_meta()); ColumnPageDecoder decoder = encodingFactory.createDecoder(encodings, encoderMetas, -compressorName); -return decoder -.decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage); +compressorName, vectorInfo != null); +if (vectorInfo != null) { + return decoder + .decodeAndFillVector(pageData.array(), offset, pageMetadata.data_page_length, vectorInfo, + nullBitSet, isLocalDictEncodedPage); +} else { + return decoder + .decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage); +} } protected DimensionColumnPage decodeDimension(DimensionRawColumnChunk rawColumnPage, - ByteBuffer pageData, DataChunk2 pageMetadata, int offset) + ByteBuffer pageData, DataChunk2 pageMetadata, int offset, ColumnVectorInfo vectorInfo) throws IOException, MemoryException { List encodings = pageMetadata.getEncoders(); if (CarbonUtil.isEncodedWithMeta(encodings)) { - ColumnPage decodedPage = decodeDimensionByMeta(pageMetadata, pageData, offset, - null != rawColumnPage.getLocalDictionary()); - decodedPage.setNullBits(QueryUtil.getNullBitSet(pageMetadata.presence, this.compressor)); int[] invertedIndexes = new int[0]; int[] invertedIndexesReverse = new int[0]; // in case of no dictionary measure data types, if it is included in sort columns // then inverted index to be uncompressed + boolean isExplicitSorted = + CarbonUtil.hasEncoding(pageMetadata.encoders, Encoding.INVERTED_INDEX); + int dataOffset = offset; if (encodings.contains(Encoding.INVERTED_INDEX)) { --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863654 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { +this.lengthSize = lengthSize; +this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); + + public int getLengthFromBuffer(ByteBuffer buffer) { +return buffer.getShort(); + } +} + +class NonDictionaryVectorFillerFactory { + + public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize, + int numberOfRows) { +if (type == DataTypes.STRING || type == DataTypes.VARCHAR) { + if (lengthSize == 2) { +return new StringVectorFiller(lengthSize, numberOfRows); + } else { +return new LongStringVectorFiller(lengthSize, numberOfRows); + } +} else if (type == DataTypes.TIMESTAMP) { + return new TimeStampVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.BOOLEAN) { + return new BooleanVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.SHORT) { + return new ShortVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.INT) { + return new IntVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.LONG) { + return new LongStringVectorFiller(lengthSize, numberOfRows); --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863651 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java --- @@ -91,6 +93,25 @@ public void putArray(final int[] invertedIndex, final int[] invertedIndexReverse } } + @Override + public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data, + ColumnVectorInfo vectorInfo) { +this.invertedIndexReverse = invertedIndex; + +// as first position will be start from 2 byte as data is stored first in the memory block +// we need to skip first two bytes this is because first two bytes will be length of the data +// which we have to skip +int lengthSize = getLengthSize(); +// creating a byte buffer which will wrap the length of the row +CarbonColumnVector vector = vectorInfo.vector; +DataType dt = vector.getType(); +ByteBuffer buffer = ByteBuffer.wrap(data); +BitSet deletedRows = vectorInfo.deletedRows; --- End diff -- removed ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863640 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/FixedLengthDimensionColumnPage.java --- @@ -46,10 +46,37 @@ public FixedLengthDimensionColumnPage(byte[] dataChunk, int[] invertedIndex, dataChunk.length; dataChunkStore = DimensionChunkStoreFactory.INSTANCE .getDimensionChunkStore(columnValueSize, isExplicitSorted, numberOfRows, totalSize, -DimensionStoreType.FIXED_LENGTH, null); +DimensionStoreType.FIXED_LENGTH, null, false); dataChunkStore.putArray(invertedIndex, invertedIndexReverse, dataChunk); } + /** + * Constructor + * + * @param dataChunkdata chunk + * @param invertedIndexinverted index + * @param invertedIndexReverse reverse inverted index + * @param numberOfRows number of rows + * @param columnValueSize size of each column value + * @param vectorInfo vector to be filled with decoded column page. + */ + public FixedLengthDimensionColumnPage(byte[] dataChunk, int[] invertedIndex, --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863624 --- Diff: integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java --- @@ -290,12 +296,24 @@ public void initBatch(MemoryMode memMode, StructType partitionColumns, } } CarbonColumnVector[] vectors = new CarbonColumnVector[fields.length]; -boolean[] filteredRows = new boolean[vectorProxy.numRows()]; -for (int i = 0; i < fields.length; i++) { - vectors[i] = new ColumnarVectorWrapper(vectorProxy, filteredRows, i); - if (isNoDictStringField[i]) { -if (vectors[i] instanceof ColumnarVectorWrapper) { - ((ColumnarVectorWrapper) vectors[i]).reserveDictionaryIds(); +boolean[] filteredRows = null; +if (queryModel.isDirectVectorFill()) { + for (int i = 0; i < fields.length; i++) { +vectors[i] = new ColumnarVectorWrapperDirect(vectorProxy, i); +if (isNoDictStringField[i]) { + if (vectors[i] instanceof ColumnarVectorWrapperDirect) { --- End diff -- removed ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863633 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java --- @@ -178,6 +196,143 @@ public double decodeDouble(float value) { public double decodeDouble(double value) { return value; } + +@Override public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) { + CarbonColumnVector vector = vectorInfo.vector; + BitSet nullBits = columnPage.getNullBits(); + DataType dataType = vector.getType(); + DataType type = columnPage.getDataType(); + int pageSize = columnPage.getPageSize(); + BitSet deletedRows = vectorInfo.deletedRows; + fillVector(columnPage, vector, dataType, type, pageSize, vectorInfo); + if (deletedRows == null || deletedRows.isEmpty()) { +for (int i = nullBits.nextSetBit(0); i >= 0; i = nullBits.nextSetBit(i + 1)) { + vector.putNull(i); +} + } +} + +private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, DataType dataType, +DataType type, int pageSize, ColumnVectorInfo vectorInfo) { + if (type == DataTypes.BOOLEAN || type == DataTypes.BYTE) { +byte[] byteData = columnPage.getByteData(); +if (dataType == DataTypes.SHORT) { + for (int i = 0; i < pageSize; i++) { +vector.putShort(i, (short) byteData[i]); + } +} else if (dataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { +vector.putInt(i, (int) byteData[i]); + } +} else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, byteData[i]); + } +} else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, byteData[i] * 1000); + } +} else if (dataType == DataTypes.BOOLEAN || dataType == DataTypes.BYTE) { + vector.putBytes(0, pageSize, byteData, 0); +} else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + decimalConverter.fillVector(byteData, pageSize, vectorInfo, columnPage.getNullBits()); +} else { + for (int i = 0; i < pageSize; i++) { +vector.putDouble(i, byteData[i]); + } +} + } else if (type == DataTypes.SHORT) { +short[] shortData = columnPage.getShortData(); +if (dataType == DataTypes.SHORT) { + vector.putShorts(0, pageSize, shortData, 0); +} else if (dataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { +vector.putInt(i, (int) shortData[i]); + } +} else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, shortData[i]); + } +} else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, shortData[i] * 1000); + } +} else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + decimalConverter.fillVector(shortData, pageSize, vectorInfo, columnPage.getNullBits()); +} else { + for (int i = 0; i < pageSize; i++) { +vector.putDouble(i, shortData[i]); + } +} + + } else if (type == DataTypes.SHORT_INT) { +int[] shortIntData = columnPage.getShortIntData(); +if (dataType == DataTypes.INT) { + vector.putInts(0, pageSize, shortIntData, 0); +} else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, shortIntData[i]); + } +} else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, shortIntData[i] * 1000); + } +} else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + decimalConverter.fillVector(shortIntData, pageSize, vectorInfo, columnPage.getNullBits()); +} else { + for (int i = 0; i < pageSize; i++) { +vector.putDouble(i, shortIntData[i]); + } +} + } else if (type == DataTypes.INT) {
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863645 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java --- @@ -29,6 +31,12 @@ */ ColumnPage decode(byte[] input, int offset, int length) throws MemoryException, IOException; + /** + * Apply decoding algorithm on input byte array and fill the vector here. + */ + ColumnPage decodeAndFillVector(byte[] input, int offset, int length, ColumnVectorInfo vectorInfo, --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863637 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java --- @@ -178,6 +196,143 @@ public double decodeDouble(float value) { public double decodeDouble(double value) { return value; } + +@Override public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) { + CarbonColumnVector vector = vectorInfo.vector; + BitSet nullBits = columnPage.getNullBits(); + DataType dataType = vector.getType(); + DataType type = columnPage.getDataType(); --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863628 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java --- @@ -95,10 +99,24 @@ public ColumnPage decode(byte[] input, int offset, int length) throws MemoryExce return LazyColumnPage.newPage(decodedPage, converter); } + @Override + public ColumnPage decodeAndFillVector(byte[] input, int offset, int length, --- End diff -- removed ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863615 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java --- @@ -49,6 +51,29 @@ public void putArray(int[] invertedIndex, int[] invertedIndexReverse, byte[] dat this.dimensionDataChunkStore.putArray(invertedIndex, invertedIndexReverse, data); } + @Override + public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data, + ColumnVectorInfo vectorInfo) { +int columnValueSize = dimensionDataChunkStore.getColumnValueSize(); +int rowsNum = data.length / columnValueSize; +CarbonColumnVector vector = vectorInfo.vector; +if (!dictionary.isDictionaryUsed()) { + vector.setDictionary(dictionary); + dictionary.setDictionaryUsed(); +} +for (int i = 0; i < rowsNum; i++) { + int surrogate = CarbonUtil.getSurrogateInternal(data, i * columnValueSize, columnValueSize); + if (surrogate == CarbonCommonConstants.MEMBER_DEFAULT_VAL_SURROGATE_KEY) { +vector.putNull(i); +vector.getDictionaryVector().putNull(i); + } else { +vector.putNotNull(i); +vector.getDictionaryVector().putInt(i, surrogate); --- End diff -- it is as per old code, will check feasible ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863618 --- Diff: integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/ColumnarVectorWrapperDirect.java --- @@ -0,0 +1,223 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.vectorreader; + +import java.math.BigDecimal; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; + +import org.apache.spark.sql.CarbonVectorProxy; +import org.apache.spark.sql.carbondata.execution.datasources.CarbonSparkDataSourceUtil; +import org.apache.spark.sql.types.Decimal; + +/** + * Fills the vector directly with out considering any deleted rows. + */ +class ColumnarVectorWrapperDirect implements CarbonColumnVector { + + protected CarbonVectorProxy.ColumnVectorProxy sparkColumnVectorProxy; + + protected CarbonVectorProxy carbonVectorProxy; --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863599 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { +this.lengthSize = lengthSize; +this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); --- End diff -- will add in another pr ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863591 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { +this.lengthSize = lengthSize; +this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); + + public int getLengthFromBuffer(ByteBuffer buffer) { +return buffer.getShort(); + } +} + +class NonDictionaryVectorFillerFactory { + + public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize, + int numberOfRows) { +if (type == DataTypes.STRING || type == DataTypes.VARCHAR) { + if (lengthSize == 2) { +return new StringVectorFiller(lengthSize, numberOfRows); + } else { +return new LongStringVectorFiller(lengthSize, numberOfRows); + } +} else if (type == DataTypes.TIMESTAMP) { + return new TimeStampVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.BOOLEAN) { + return new BooleanVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.SHORT) { + return new ShortVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.INT) { + return new IntVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.LONG) { + return new LongStringVectorFiller(lengthSize, numberOfRows); +} +return new StringVectorFiller(lengthSize, numberOfRows); + } + +} + +class StringVectorFiller extends AbstractNonDictionaryVectorFiller { + + public StringVectorFiller(int lengthSize, int numberOfRows) { +super(lengthSize, numberOfRows); + } + + @Override + public void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer) { +// start position will be used to store the current data position +int startOffset = 0; +int currentOffset = lengthSize; +ByteUtil.UnsafeComparer comparer = ByteUtil.UnsafeComparer.INSTANCE; +for (int i = 0; i < numberOfRows - 1; i++) { + buffer.position(startOffset); --- End diff -- will add in another pr ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863569 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DirectPageWiseVectorFillResultCollector.java --- @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.collector.impl; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.BitSet; +import java.util.List; + +import org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryKeyGeneratorFactory; +import org.apache.carbondata.core.metadata.encoder.Encoding; +import org.apache.carbondata.core.mutate.DeleteDeltaVo; +import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo; +import org.apache.carbondata.core.scan.model.ProjectionDimension; +import org.apache.carbondata.core.scan.model.ProjectionMeasure; +import org.apache.carbondata.core.scan.result.BlockletScannedResult; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnarBatch; +import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo; +import org.apache.carbondata.core.scan.result.vector.MeasureDataVectorProcessor; + +/** + * It delegates the vector to fill the data directly from decoded pages. + */ +public class DirectPageWiseVectorFillResultCollector extends AbstractScannedResultCollector { + + protected ProjectionDimension[] queryDimensions; + + protected ProjectionMeasure[] queryMeasures; + + private ColumnVectorInfo[] dictionaryInfo; + + private ColumnVectorInfo[] noDictionaryInfo; + + private ColumnVectorInfo[] complexInfo; + + private ColumnVectorInfo[] measureColumnInfo; + + ColumnVectorInfo[] allColumnInfo; + + public DirectPageWiseVectorFillResultCollector(BlockExecutionInfo blockExecutionInfos) { +super(blockExecutionInfos); +// initialize only if the current block is not a restructured block else the initialization +// will be taken care by RestructureBasedVectorResultCollector +if (!blockExecutionInfos.isRestructuredBlock()) { --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863577 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/LazyColumnPage.java --- @@ -91,6 +108,8 @@ public double getDouble(int rowId) { } } + + --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863584 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeFixedLengthDimensionDataChunkStore.java --- @@ -30,9 +35,52 @@ */ private int columnValueSize; - public SafeFixedLengthDimensionDataChunkStore(boolean isInvertedIndex, int columnValueSize) { + private int numOfRows; + + public SafeFixedLengthDimensionDataChunkStore(boolean isInvertedIndex, int columnValueSize, + int numOfRows) { super(isInvertedIndex); this.columnValueSize = columnValueSize; +this.numOfRows = numOfRows; + } + + @Override + public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data, + ColumnVectorInfo vectorInfo) { +CarbonColumnVector vector = vectorInfo.vector; +fillVector(data, vectorInfo, vector); + } + + private void fillVector(byte[] data, ColumnVectorInfo vectorInfo, CarbonColumnVector vector) { +DataType dataType = vectorInfo.vector.getBlockDataType(); +if (dataType == DataTypes.DATE) { + for (int i = 0; i < numOfRows; i++) { +int surrogateInternal = +CarbonUtil.getSurrogateInternal(data, i * columnValueSize, columnValueSize); +if (surrogateInternal == 1) { + vector.putNull(i); +} else { + vector.putInt(i, surrogateInternal - DateDirectDictionaryGenerator.cutOffDate); +} + } +} else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < numOfRows; i++) { +int surrogateInternal = +CarbonUtil.getSurrogateInternal(data, i * columnValueSize, columnValueSize); +if (surrogateInternal == 1) { --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863559 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DirectPageWiseVectorFillResultCollector.java --- @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.collector.impl; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.BitSet; +import java.util.List; + +import org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryKeyGeneratorFactory; +import org.apache.carbondata.core.metadata.encoder.Encoding; +import org.apache.carbondata.core.mutate.DeleteDeltaVo; +import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo; +import org.apache.carbondata.core.scan.model.ProjectionDimension; +import org.apache.carbondata.core.scan.model.ProjectionMeasure; +import org.apache.carbondata.core.scan.result.BlockletScannedResult; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnarBatch; +import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo; +import org.apache.carbondata.core.scan.result.vector.MeasureDataVectorProcessor; + +/** + * It delegates the vector to fill the data directly from decoded pages. + */ +public class DirectPageWiseVectorFillResultCollector extends AbstractScannedResultCollector { --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863575 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaIntegralCodec.java --- @@ -272,5 +293,164 @@ public double decodeDouble(double value) { // this codec is for integer type only throw new RuntimeException("internal error"); } + +@Override +public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) { + CarbonColumnVector vector = vectorInfo.vector; + BitSet nullBits = columnPage.getNullBits(); + DataType dataType = vector.getType(); + DataType type = columnPage.getDataType(); + int pageSize = columnPage.getPageSize(); + BitSet deletedRows = vectorInfo.deletedRows; + fillVector(columnPage, vector, dataType, type, pageSize, vectorInfo); + if (deletedRows == null || deletedRows.isEmpty()) { +for (int i = nullBits.nextSetBit(0); i >= 0; i = nullBits.nextSetBit(i + 1)) { + vector.putNull(i); +} + } +} + +private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, DataType dataType, +DataType type, int pageSize, ColumnVectorInfo vectorInfo) { + if (type == DataTypes.BOOLEAN || type == DataTypes.BYTE) { +byte[] byteData = columnPage.getByteData(); +if (dataType == DataTypes.SHORT) { + for (int i = 0; i < pageSize; i++) { +vector.putShort(i, (short) (max - byteData[i])); + } +} else if (dataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { +vector.putInt(i, (int) (max - byteData[i])); + } +} else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, (max - byteData[i])); + } +} else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, (max - byteData[i]) * 1000); + } +} else if (dataType == DataTypes.BOOLEAN) { + for (int i = 0; i < pageSize; i++) { +vector.putByte(i, (byte) (max - byteData[i])); + } +} else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + int precision = vectorInfo.measure.getMeasure().getPrecision(); + for (int i = 0; i < pageSize; i++) { +BigDecimal decimal = decimalConverter.getDecimal(max - byteData[i]); +vector.putDecimal(i, decimal, precision); + } +} else { + for (int i = 0; i < pageSize; i++) { +vector.putDouble(i, (max - byteData[i])); + } +} + } else if (type == DataTypes.SHORT) { +short[] shortData = columnPage.getShortData(); +if (dataType == DataTypes.SHORT) { + for (int i = 0; i < pageSize; i++) { +vector.putShort(i, (short) (max - shortData[i])); + } +} else if (dataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { +vector.putInt(i, (int) (max - shortData[i])); + } +} else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, (max - shortData[i])); + } +} else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, (max - shortData[i]) * 1000); + } +} else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + int precision = vectorInfo.measure.getMeasure().getPrecision(); + for (int i = 0; i < pageSize; i++) { +BigDecimal decimal = decimalConverter.getDecimal(max - shortData[i]); +vector.putDecimal(i, decimal, precision); + } +} else { + for (int i = 0; i < pageSize; i++) { +vector.putDouble(i, (max - shortData[i])); + } +} + + } else if (type == DataTypes.SHORT_INT) { +int[] shortIntData = columnPage.getShortIntData(); +if (dataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { +vector.putInt(i, (int) (max - shortIntData[i])); + } +} else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, (max
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863585 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeFixedLengthDimensionDataChunkStore.java --- @@ -30,9 +35,52 @@ */ private int columnValueSize; - public SafeFixedLengthDimensionDataChunkStore(boolean isInvertedIndex, int columnValueSize) { + private int numOfRows; + + public SafeFixedLengthDimensionDataChunkStore(boolean isInvertedIndex, int columnValueSize, + int numOfRows) { super(isInvertedIndex); this.columnValueSize = columnValueSize; +this.numOfRows = numOfRows; + } + + @Override + public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data, + ColumnVectorInfo vectorInfo) { +CarbonColumnVector vector = vectorInfo.vector; +fillVector(data, vectorInfo, vector); + } + + private void fillVector(byte[] data, ColumnVectorInfo vectorInfo, CarbonColumnVector vector) { +DataType dataType = vectorInfo.vector.getBlockDataType(); +if (dataType == DataTypes.DATE) { + for (int i = 0; i < numOfRows; i++) { +int surrogateInternal = +CarbonUtil.getSurrogateInternal(data, i * columnValueSize, columnValueSize); +if (surrogateInternal == 1) { --- End diff -- ok ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226863580 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/LazyColumnPage.java --- @@ -42,10 +43,26 @@ private LazyColumnPage(ColumnPage columnPage, ColumnPageValueConverter converter this.converter = converter; } + private LazyColumnPage(ColumnPage columnPage, ColumnPageValueConverter converter, + ColumnVectorInfo vectorInfo) { +super(columnPage.getColumnPageEncoderMeta(), columnPage.getPageSize()); +this.columnPage = columnPage; +this.converter = converter; +if (columnPage instanceof DecimalColumnPage) { + vectorInfo.decimalConverter = ((DecimalColumnPage) columnPage).getDecimalConverter(); +} +converter.decodeAndFillVector(columnPage, vectorInfo); + } + public static ColumnPage newPage(ColumnPage columnPage, ColumnPageValueConverter codec) { return new LazyColumnPage(columnPage, codec); } + public static ColumnPage newPage(ColumnPage columnPage, ColumnPageValueConverter codec, + ColumnVectorInfo vectorInfo) { +return new LazyColumnPage(columnPage, codec, vectorInfo); + } --- End diff -- removed ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226832821 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/datatype/DecimalConverterFactory.java --- @@ -173,6 +221,23 @@ public int getSizeInBytes() { return new BigDecimal(bigInteger, scale); } +@Override public void fillVector(Object valuesToBeConverted, int size, ColumnVectorInfo info, +BitSet nullBitset) { + CarbonColumnVector vector = info.vector; + int precision = info.measure.getMeasure().getPrecision(); + if (valuesToBeConverted instanceof byte[][]) { +byte[][] data = (byte[][]) valuesToBeConverted; +for (int i = 0; i < size; i++) { + if (nullBitset.get(i)) { +vector.putNull(i); + } else { +BigInteger bigInteger = new BigInteger(data[i]); +vector.putDecimal(i, new BigDecimal(bigInteger, scale), precision); --- End diff -- The method `DataTypeUtil.byteToBigDecimal(data[i]), precision)` is different as it calculates scale also from binary, but here we no need of it. ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226223364 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/LazyColumnPage.java --- @@ -91,6 +108,8 @@ public double getDouble(int rowId) { } } + + --- End diff -- Remove the extra lines ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226188701 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeFixedLengthDimensionDataChunkStore.java --- @@ -30,9 +35,52 @@ */ private int columnValueSize; - public SafeFixedLengthDimensionDataChunkStore(boolean isInvertedIndex, int columnValueSize) { + private int numOfRows; + + public SafeFixedLengthDimensionDataChunkStore(boolean isInvertedIndex, int columnValueSize, + int numOfRows) { super(isInvertedIndex); this.columnValueSize = columnValueSize; +this.numOfRows = numOfRows; + } + + @Override + public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data, + ColumnVectorInfo vectorInfo) { +CarbonColumnVector vector = vectorInfo.vector; +fillVector(data, vectorInfo, vector); + } + + private void fillVector(byte[] data, ColumnVectorInfo vectorInfo, CarbonColumnVector vector) { +DataType dataType = vectorInfo.vector.getBlockDataType(); +if (dataType == DataTypes.DATE) { + for (int i = 0; i < numOfRows; i++) { +int surrogateInternal = +CarbonUtil.getSurrogateInternal(data, i * columnValueSize, columnValueSize); +if (surrogateInternal == 1) { + vector.putNull(i); +} else { + vector.putInt(i, surrogateInternal - DateDirectDictionaryGenerator.cutOffDate); +} + } +} else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < numOfRows; i++) { +int surrogateInternal = +CarbonUtil.getSurrogateInternal(data, i * columnValueSize, columnValueSize); +if (surrogateInternal == 1) { --- End diff -- Replace 1 with `CarbonCommonConstants.MEMBER_DEFAULT_VAL_SURROGATE_KEY` ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226223232 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/LazyColumnPage.java --- @@ -42,10 +43,26 @@ private LazyColumnPage(ColumnPage columnPage, ColumnPageValueConverter converter this.converter = converter; } + private LazyColumnPage(ColumnPage columnPage, ColumnPageValueConverter converter, + ColumnVectorInfo vectorInfo) { +super(columnPage.getColumnPageEncoderMeta(), columnPage.getPageSize()); +this.columnPage = columnPage; +this.converter = converter; +if (columnPage instanceof DecimalColumnPage) { + vectorInfo.decimalConverter = ((DecimalColumnPage) columnPage).getDecimalConverter(); +} +converter.decodeAndFillVector(columnPage, vectorInfo); + } + public static ColumnPage newPage(ColumnPage columnPage, ColumnPageValueConverter codec) { return new LazyColumnPage(columnPage, codec); } + public static ColumnPage newPage(ColumnPage columnPage, ColumnPageValueConverter codec, + ColumnVectorInfo vectorInfo) { +return new LazyColumnPage(columnPage, codec, vectorInfo); + } --- End diff -- I am not sure what is the significance of making a static method and creating a object from it. As we are not doing anything extra in this method we can make the constructor itself public and remove this static method ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226188657 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeFixedLengthDimensionDataChunkStore.java --- @@ -30,9 +35,52 @@ */ private int columnValueSize; - public SafeFixedLengthDimensionDataChunkStore(boolean isInvertedIndex, int columnValueSize) { + private int numOfRows; + + public SafeFixedLengthDimensionDataChunkStore(boolean isInvertedIndex, int columnValueSize, + int numOfRows) { super(isInvertedIndex); this.columnValueSize = columnValueSize; +this.numOfRows = numOfRows; + } + + @Override + public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data, + ColumnVectorInfo vectorInfo) { +CarbonColumnVector vector = vectorInfo.vector; +fillVector(data, vectorInfo, vector); + } + + private void fillVector(byte[] data, ColumnVectorInfo vectorInfo, CarbonColumnVector vector) { +DataType dataType = vectorInfo.vector.getBlockDataType(); +if (dataType == DataTypes.DATE) { + for (int i = 0; i < numOfRows; i++) { +int surrogateInternal = +CarbonUtil.getSurrogateInternal(data, i * columnValueSize, columnValueSize); +if (surrogateInternal == 1) { --- End diff -- Replace 1 with `CarbonCommonConstants.MEMBER_DEFAULT_VAL_SURROGATE_KEY` ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226299668 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaIntegralCodec.java --- @@ -272,5 +293,164 @@ public double decodeDouble(double value) { // this codec is for integer type only throw new RuntimeException("internal error"); } + +@Override +public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) { + CarbonColumnVector vector = vectorInfo.vector; + BitSet nullBits = columnPage.getNullBits(); + DataType dataType = vector.getType(); + DataType type = columnPage.getDataType(); + int pageSize = columnPage.getPageSize(); + BitSet deletedRows = vectorInfo.deletedRows; + fillVector(columnPage, vector, dataType, type, pageSize, vectorInfo); + if (deletedRows == null || deletedRows.isEmpty()) { +for (int i = nullBits.nextSetBit(0); i >= 0; i = nullBits.nextSetBit(i + 1)) { + vector.putNull(i); +} + } +} + +private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, DataType dataType, +DataType type, int pageSize, ColumnVectorInfo vectorInfo) { + if (type == DataTypes.BOOLEAN || type == DataTypes.BYTE) { +byte[] byteData = columnPage.getByteData(); +if (dataType == DataTypes.SHORT) { + for (int i = 0; i < pageSize; i++) { +vector.putShort(i, (short) (max - byteData[i])); + } +} else if (dataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { +vector.putInt(i, (int) (max - byteData[i])); + } +} else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, (max - byteData[i])); + } +} else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, (max - byteData[i]) * 1000); + } +} else if (dataType == DataTypes.BOOLEAN) { + for (int i = 0; i < pageSize; i++) { +vector.putByte(i, (byte) (max - byteData[i])); + } +} else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + int precision = vectorInfo.measure.getMeasure().getPrecision(); + for (int i = 0; i < pageSize; i++) { +BigDecimal decimal = decimalConverter.getDecimal(max - byteData[i]); +vector.putDecimal(i, decimal, precision); + } +} else { + for (int i = 0; i < pageSize; i++) { +vector.putDouble(i, (max - byteData[i])); + } +} + } else if (type == DataTypes.SHORT) { +short[] shortData = columnPage.getShortData(); +if (dataType == DataTypes.SHORT) { + for (int i = 0; i < pageSize; i++) { +vector.putShort(i, (short) (max - shortData[i])); + } +} else if (dataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { +vector.putInt(i, (int) (max - shortData[i])); + } +} else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, (max - shortData[i])); + } +} else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, (max - shortData[i]) * 1000); + } +} else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + int precision = vectorInfo.measure.getMeasure().getPrecision(); + for (int i = 0; i < pageSize; i++) { +BigDecimal decimal = decimalConverter.getDecimal(max - shortData[i]); +vector.putDecimal(i, decimal, precision); + } +} else { + for (int i = 0; i < pageSize; i++) { +vector.putDouble(i, (max - shortData[i])); + } +} + + } else if (type == DataTypes.SHORT_INT) { +int[] shortIntData = columnPage.getShortIntData(); +if (dataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { +vector.putInt(i, (int) (max - shortIntData[i])); + } +} else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i,
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226317816 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/datatype/DecimalConverterFactory.java --- @@ -173,6 +221,23 @@ public int getSizeInBytes() { return new BigDecimal(bigInteger, scale); } +@Override public void fillVector(Object valuesToBeConverted, int size, ColumnVectorInfo info, +BitSet nullBitset) { + CarbonColumnVector vector = info.vector; + int precision = info.measure.getMeasure().getPrecision(); + if (valuesToBeConverted instanceof byte[][]) { +byte[][] data = (byte[][]) valuesToBeConverted; +for (int i = 0; i < size; i++) { + if (nullBitset.get(i)) { +vector.putNull(i); + } else { +BigInteger bigInteger = new BigInteger(data[i]); +vector.putDecimal(i, new BigDecimal(bigInteger, scale), precision); --- End diff -- here can we use the code `vector.putDecimal(i, DataTypeUtil.byteToBigDecimal(data[i]), precision)`. This is same as used i below fillVector method. If it can be used then we can refactor the code and create only 1 method ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226321251 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DirectPageWiseVectorFillResultCollector.java --- @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.collector.impl; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.BitSet; +import java.util.List; + +import org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryKeyGeneratorFactory; +import org.apache.carbondata.core.metadata.encoder.Encoding; +import org.apache.carbondata.core.mutate.DeleteDeltaVo; +import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo; +import org.apache.carbondata.core.scan.model.ProjectionDimension; +import org.apache.carbondata.core.scan.model.ProjectionMeasure; +import org.apache.carbondata.core.scan.result.BlockletScannedResult; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnarBatch; +import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo; +import org.apache.carbondata.core.scan.result.vector.MeasureDataVectorProcessor; + +/** + * It delegates the vector to fill the data directly from decoded pages. + */ +public class DirectPageWiseVectorFillResultCollector extends AbstractScannedResultCollector { + + protected ProjectionDimension[] queryDimensions; + + protected ProjectionMeasure[] queryMeasures; + + private ColumnVectorInfo[] dictionaryInfo; + + private ColumnVectorInfo[] noDictionaryInfo; + + private ColumnVectorInfo[] complexInfo; + + private ColumnVectorInfo[] measureColumnInfo; + + ColumnVectorInfo[] allColumnInfo; + + public DirectPageWiseVectorFillResultCollector(BlockExecutionInfo blockExecutionInfos) { +super(blockExecutionInfos); +// initialize only if the current block is not a restructured block else the initialization +// will be taken care by RestructureBasedVectorResultCollector +if (!blockExecutionInfos.isRestructuredBlock()) { --- End diff -- `RestructureBasedVectorResultCollector` is extending from `DictionaryBasedVectorResultCollector`. So this check will not work here. Please check the scenario of restructured block in case of `DirectPageWiseVectorFillResultCollector`. For restructure case I think the flow will not come here but you can decide whether to use directVectorFill flow for restructure case also ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226329882 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DirectPageWiseVectorFillResultCollector.java --- @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.collector.impl; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.BitSet; +import java.util.List; + +import org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryKeyGeneratorFactory; +import org.apache.carbondata.core.metadata.encoder.Encoding; +import org.apache.carbondata.core.mutate.DeleteDeltaVo; +import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo; +import org.apache.carbondata.core.scan.model.ProjectionDimension; +import org.apache.carbondata.core.scan.model.ProjectionMeasure; +import org.apache.carbondata.core.scan.result.BlockletScannedResult; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnarBatch; +import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo; +import org.apache.carbondata.core.scan.result.vector.MeasureDataVectorProcessor; + +/** + * It delegates the vector to fill the data directly from decoded pages. + */ +public class DirectPageWiseVectorFillResultCollector extends AbstractScannedResultCollector { --- End diff -- If feasible can this new class be avoided and introduced one more method in `DictionaryBasedVectorResultCollector` and from there call for direct conversing and filling of vector. This will also help in handling of other case like restructure scenarios which otherwise using the new class cannot be achieved directly. ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226334271 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { +this.lengthSize = lengthSize; +this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); + + public int getLengthFromBuffer(ByteBuffer buffer) { +return buffer.getShort(); + } +} + +class NonDictionaryVectorFillerFactory { + + public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize, + int numberOfRows) { +if (type == DataTypes.STRING || type == DataTypes.VARCHAR) { + if (lengthSize == 2) { +return new StringVectorFiller(lengthSize, numberOfRows); + } else { +return new LongStringVectorFiller(lengthSize, numberOfRows); + } +} else if (type == DataTypes.TIMESTAMP) { + return new TimeStampVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.BOOLEAN) { + return new BooleanVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.SHORT) { + return new ShortVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.INT) { + return new IntVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.LONG) { + return new LongStringVectorFiller(lengthSize, numberOfRows); +} +return new StringVectorFiller(lengthSize, numberOfRows); + } + +} + +class StringVectorFiller extends AbstractNonDictionaryVectorFiller { + + public StringVectorFiller(int lengthSize, int numberOfRows) { +super(lengthSize, numberOfRows); + } + + @Override + public void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer) { +// start position will be used to store the current data position +int startOffset = 0; +int currentOffset = lengthSize; +ByteUtil.UnsafeComparer comparer = ByteUtil.UnsafeComparer.INSTANCE; +for (int i = 0; i < numberOfRows - 1; i++) { + buffer.position(startOffset); --- End diff -- ```suggestion (((data[offset] & 0xFF) << 8) | (data[offset + 1] & 0xFF)); ``` based on above comment we can update this logic for all inner classes ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r226333633 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { +this.lengthSize = lengthSize; +this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); --- End diff -- Instead of byte buffer we can directly pass byte[] and can get length from byte array based on length data type like for varchar 4 bytes and for others 2 bytes ```suggestion public abstract void fillVector(byte[] data, CarbonColumnVector vector, byte[] buffer); ``` (((data[offset] & 0xFF) << 8) | (data[offset + 1] & 0xFF)); ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r225948814 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java --- @@ -49,6 +51,29 @@ public void putArray(int[] invertedIndex, int[] invertedIndexReverse, byte[] dat this.dimensionDataChunkStore.putArray(invertedIndex, invertedIndexReverse, data); } + @Override + public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data, + ColumnVectorInfo vectorInfo) { +int columnValueSize = dimensionDataChunkStore.getColumnValueSize(); +int rowsNum = data.length / columnValueSize; +CarbonColumnVector vector = vectorInfo.vector; +if (!dictionary.isDictionaryUsed()) { + vector.setDictionary(dictionary); + dictionary.setDictionaryUsed(); +} +for (int i = 0; i < rowsNum; i++) { + int surrogate = CarbonUtil.getSurrogateInternal(data, i * columnValueSize, columnValueSize); + if (surrogate == CarbonCommonConstants.MEMBER_DEFAULT_VAL_SURROGATE_KEY) { +vector.putNull(i); +vector.getDictionaryVector().putNull(i); + } else { +vector.putNotNull(i); +vector.getDictionaryVector().putInt(i, surrogate); --- End diff -- Can we try to put the surrogate directly and avoid 2 vectors here. For null values we can handle at the time of conversion from surrogate key to actual value in `CarbonDictionaryWrapper`. If it is feasible can we try this? ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r225836328 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java --- @@ -178,6 +196,143 @@ public double decodeDouble(float value) { public double decodeDouble(double value) { return value; } + +@Override public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) { + CarbonColumnVector vector = vectorInfo.vector; + BitSet nullBits = columnPage.getNullBits(); + DataType dataType = vector.getType(); + DataType type = columnPage.getDataType(); + int pageSize = columnPage.getPageSize(); + BitSet deletedRows = vectorInfo.deletedRows; + fillVector(columnPage, vector, dataType, type, pageSize, vectorInfo); + if (deletedRows == null || deletedRows.isEmpty()) { +for (int i = nullBits.nextSetBit(0); i >= 0; i = nullBits.nextSetBit(i + 1)) { + vector.putNull(i); +} + } +} + +private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, DataType dataType, +DataType type, int pageSize, ColumnVectorInfo vectorInfo) { + if (type == DataTypes.BOOLEAN || type == DataTypes.BYTE) { +byte[] byteData = columnPage.getByteData(); +if (dataType == DataTypes.SHORT) { + for (int i = 0; i < pageSize; i++) { +vector.putShort(i, (short) byteData[i]); + } +} else if (dataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { +vector.putInt(i, (int) byteData[i]); + } +} else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, byteData[i]); + } +} else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, byteData[i] * 1000); + } +} else if (dataType == DataTypes.BOOLEAN || dataType == DataTypes.BYTE) { + vector.putBytes(0, pageSize, byteData, 0); +} else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + decimalConverter.fillVector(byteData, pageSize, vectorInfo, columnPage.getNullBits()); +} else { + for (int i = 0; i < pageSize; i++) { +vector.putDouble(i, byteData[i]); + } +} + } else if (type == DataTypes.SHORT) { +short[] shortData = columnPage.getShortData(); +if (dataType == DataTypes.SHORT) { + vector.putShorts(0, pageSize, shortData, 0); +} else if (dataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { +vector.putInt(i, (int) shortData[i]); + } +} else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, shortData[i]); + } +} else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, shortData[i] * 1000); + } +} else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + decimalConverter.fillVector(shortData, pageSize, vectorInfo, columnPage.getNullBits()); +} else { + for (int i = 0; i < pageSize; i++) { +vector.putDouble(i, shortData[i]); + } +} + + } else if (type == DataTypes.SHORT_INT) { +int[] shortIntData = columnPage.getShortIntData(); +if (dataType == DataTypes.INT) { + vector.putInts(0, pageSize, shortIntData, 0); +} else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, shortIntData[i]); + } +} else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, shortIntData[i] * 1000); + } +} else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + decimalConverter.fillVector(shortIntData, pageSize, vectorInfo, columnPage.getNullBits()); +} else { + for (int i = 0; i < pageSize; i++) { +vector.putDouble(i, shortIntData[i]); + } +} + } else if (type == DataTypes.INT)
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r225853518 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java --- @@ -95,10 +99,24 @@ public ColumnPage decode(byte[] input, int offset, int length) throws MemoryExce return LazyColumnPage.newPage(decodedPage, converter); } + @Override + public ColumnPage decodeAndFillVector(byte[] input, int offset, int length, --- End diff -- As per this flow first codec object is created and then from codec `createDecoder` method creates `LazyColummnPage`. `LazyColumnPage` constructor calls for filling the vector using the converter which is created by codec. So we remove `LazyColumnPage` and make the flow simpler ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r225798939 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java --- @@ -178,6 +196,143 @@ public double decodeDouble(float value) { public double decodeDouble(double value) { return value; } + +@Override public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) { + CarbonColumnVector vector = vectorInfo.vector; + BitSet nullBits = columnPage.getNullBits(); + DataType dataType = vector.getType(); + DataType type = columnPage.getDataType(); --- End diff -- Rename the above variables to `vectorDataType` and `pageDataType` and use it everywhere in calling methods. It will give better understanding Make this changes in all the classed where this code is newly added ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r225866022 --- Diff: integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java --- @@ -290,12 +296,24 @@ public void initBatch(MemoryMode memMode, StructType partitionColumns, } } CarbonColumnVector[] vectors = new CarbonColumnVector[fields.length]; -boolean[] filteredRows = new boolean[vectorProxy.numRows()]; -for (int i = 0; i < fields.length; i++) { - vectors[i] = new ColumnarVectorWrapper(vectorProxy, filteredRows, i); - if (isNoDictStringField[i]) { -if (vectors[i] instanceof ColumnarVectorWrapper) { - ((ColumnarVectorWrapper) vectors[i]).reserveDictionaryIds(); +boolean[] filteredRows = null; +if (queryModel.isDirectVectorFill()) { + for (int i = 0; i < fields.length; i++) { +vectors[i] = new ColumnarVectorWrapperDirect(vectorProxy, i); +if (isNoDictStringField[i]) { + if (vectors[i] instanceof ColumnarVectorWrapperDirect) { --- End diff -- if check is not required here as here vector is instance of ColumnarVectorWrapperDirect ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r225868472 --- Diff: integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/ColumnarVectorWrapperDirect.java --- @@ -0,0 +1,223 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.vectorreader; + +import java.math.BigDecimal; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; + +import org.apache.spark.sql.CarbonVectorProxy; +import org.apache.spark.sql.carbondata.execution.datasources.CarbonSparkDataSourceUtil; +import org.apache.spark.sql.types.Decimal; + +/** + * Fills the vector directly with out considering any deleted rows. + */ +class ColumnarVectorWrapperDirect implements CarbonColumnVector { + + protected CarbonVectorProxy.ColumnVectorProxy sparkColumnVectorProxy; + + protected CarbonVectorProxy carbonVectorProxy; --- End diff -- please add a comment to explain the role of `sparkColumnVectorProxy` and `carbonVectorProxy` ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r225828205 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/FixedLengthDimensionColumnPage.java --- @@ -46,10 +46,37 @@ public FixedLengthDimensionColumnPage(byte[] dataChunk, int[] invertedIndex, dataChunk.length; dataChunkStore = DimensionChunkStoreFactory.INSTANCE .getDimensionChunkStore(columnValueSize, isExplicitSorted, numberOfRows, totalSize, -DimensionStoreType.FIXED_LENGTH, null); +DimensionStoreType.FIXED_LENGTH, null, false); dataChunkStore.putArray(invertedIndex, invertedIndexReverse, dataChunk); } + /** + * Constructor + * + * @param dataChunkdata chunk + * @param invertedIndexinverted index + * @param invertedIndexReverse reverse inverted index + * @param numberOfRows number of rows + * @param columnValueSize size of each column value + * @param vectorInfo vector to be filled with decoded column page. + */ + public FixedLengthDimensionColumnPage(byte[] dataChunk, int[] invertedIndex, --- End diff -- For direct vector filler this instance creation is not required as it will not be used. ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r225826457 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java --- @@ -29,6 +31,12 @@ */ ColumnPage decode(byte[] input, int offset, int length) throws MemoryException, IOException; + /** + * Apply decoding algorithm on input byte array and fill the vector here. + */ + ColumnPage decodeAndFillVector(byte[] input, int offset, int length, ColumnVectorInfo vectorInfo, --- End diff -- decodeAndFillVector return type can be void as ColumnPage returned from decodeAndFillVector will not be used ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r225806802 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java --- @@ -91,6 +93,25 @@ public void putArray(final int[] invertedIndex, final int[] invertedIndexReverse } } + @Override + public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data, + ColumnVectorInfo vectorInfo) { +this.invertedIndexReverse = invertedIndex; + +// as first position will be start from 2 byte as data is stored first in the memory block +// we need to skip first two bytes this is because first two bytes will be length of the data +// which we have to skip +int lengthSize = getLengthSize(); +// creating a byte buffer which will wrap the length of the row +CarbonColumnVector vector = vectorInfo.vector; +DataType dt = vector.getType(); +ByteBuffer buffer = ByteBuffer.wrap(data); +BitSet deletedRows = vectorInfo.deletedRows; --- End diff -- DeletedRows is unused please check ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r225806081 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { +this.lengthSize = lengthSize; +this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); + + public int getLengthFromBuffer(ByteBuffer buffer) { +return buffer.getShort(); + } +} + +class NonDictionaryVectorFillerFactory { + + public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize, + int numberOfRows) { +if (type == DataTypes.STRING || type == DataTypes.VARCHAR) { + if (lengthSize == 2) { +return new StringVectorFiller(lengthSize, numberOfRows); + } else { +return new LongStringVectorFiller(lengthSize, numberOfRows); + } +} else if (type == DataTypes.TIMESTAMP) { + return new TimeStampVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.BOOLEAN) { + return new BooleanVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.SHORT) { + return new ShortVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.INT) { + return new IntVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.LONG) { + return new LongStringVectorFiller(lengthSize, numberOfRows); --- End diff -- Need to call LongVectorFiller(lengthSize, numberOfRows) for long data type ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r225794798 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java --- @@ -221,49 +229,66 @@ protected DimensionRawColumnChunk getDimensionRawColumnChunk(FileReader fileRead int offset = (int) rawColumnPage.getOffSet() + dimensionChunksLength .get(rawColumnPage.getColumnIndex()) + dataChunk3.getPage_offset().get(pageNumber); // first read the data and uncompressed it -return decodeDimension(rawColumnPage, rawData, pageMetadata, offset); +return decodeDimension(rawColumnPage, rawData, pageMetadata, offset, vectorInfo); + } + + @Override + public void decodeColumnPageAndFillVector(DimensionRawColumnChunk dimensionRawColumnChunk, + int pageNumber, ColumnVectorInfo vectorInfo) throws IOException, MemoryException { +DimensionColumnPage columnPage = +decodeColumnPage(dimensionRawColumnChunk, pageNumber, vectorInfo); +columnPage.freeMemory(); } - private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, - ByteBuffer pageData, int offset, boolean isLocalDictEncodedPage) + private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, ByteBuffer pageData, int offset, + boolean isLocalDictEncodedPage, ColumnVectorInfo vectorInfo, BitSet nullBitSet) throws IOException, MemoryException { List encodings = pageMetadata.getEncoders(); List encoderMetas = pageMetadata.getEncoder_meta(); String compressorName = CarbonMetadataUtil.getCompressorNameFromChunkMeta( pageMetadata.getChunk_meta()); ColumnPageDecoder decoder = encodingFactory.createDecoder(encodings, encoderMetas, -compressorName); -return decoder -.decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage); +compressorName, vectorInfo != null); +if (vectorInfo != null) { + return decoder + .decodeAndFillVector(pageData.array(), offset, pageMetadata.data_page_length, vectorInfo, + nullBitSet, isLocalDictEncodedPage); +} else { + return decoder + .decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage); +} } protected DimensionColumnPage decodeDimension(DimensionRawColumnChunk rawColumnPage, - ByteBuffer pageData, DataChunk2 pageMetadata, int offset) + ByteBuffer pageData, DataChunk2 pageMetadata, int offset, ColumnVectorInfo vectorInfo) throws IOException, MemoryException { List encodings = pageMetadata.getEncoders(); if (CarbonUtil.isEncodedWithMeta(encodings)) { - ColumnPage decodedPage = decodeDimensionByMeta(pageMetadata, pageData, offset, - null != rawColumnPage.getLocalDictionary()); - decodedPage.setNullBits(QueryUtil.getNullBitSet(pageMetadata.presence, this.compressor)); int[] invertedIndexes = new int[0]; int[] invertedIndexesReverse = new int[0]; // in case of no dictionary measure data types, if it is included in sort columns // then inverted index to be uncompressed + boolean isExplicitSorted = + CarbonUtil.hasEncoding(pageMetadata.encoders, Encoding.INVERTED_INDEX); + int dataOffset = offset; if (encodings.contains(Encoding.INVERTED_INDEX)) { --- End diff -- use isExplicitSorted as inverted index is present in encoding is already taken in this variable ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r225794608 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java --- @@ -221,49 +229,66 @@ protected DimensionRawColumnChunk getDimensionRawColumnChunk(FileReader fileRead int offset = (int) rawColumnPage.getOffSet() + dimensionChunksLength .get(rawColumnPage.getColumnIndex()) + dataChunk3.getPage_offset().get(pageNumber); // first read the data and uncompressed it -return decodeDimension(rawColumnPage, rawData, pageMetadata, offset); +return decodeDimension(rawColumnPage, rawData, pageMetadata, offset, vectorInfo); + } + + @Override + public void decodeColumnPageAndFillVector(DimensionRawColumnChunk dimensionRawColumnChunk, + int pageNumber, ColumnVectorInfo vectorInfo) throws IOException, MemoryException { +DimensionColumnPage columnPage = +decodeColumnPage(dimensionRawColumnChunk, pageNumber, vectorInfo); +columnPage.freeMemory(); } - private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, - ByteBuffer pageData, int offset, boolean isLocalDictEncodedPage) + private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, ByteBuffer pageData, int offset, + boolean isLocalDictEncodedPage, ColumnVectorInfo vectorInfo, BitSet nullBitSet) throws IOException, MemoryException { List encodings = pageMetadata.getEncoders(); List encoderMetas = pageMetadata.getEncoder_meta(); String compressorName = CarbonMetadataUtil.getCompressorNameFromChunkMeta( pageMetadata.getChunk_meta()); ColumnPageDecoder decoder = encodingFactory.createDecoder(encodings, encoderMetas, -compressorName); -return decoder -.decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage); +compressorName, vectorInfo != null); +if (vectorInfo != null) { + return decoder + .decodeAndFillVector(pageData.array(), offset, pageMetadata.data_page_length, vectorInfo, + nullBitSet, isLocalDictEncodedPage); +} else { + return decoder + .decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage); +} } protected DimensionColumnPage decodeDimension(DimensionRawColumnChunk rawColumnPage, - ByteBuffer pageData, DataChunk2 pageMetadata, int offset) + ByteBuffer pageData, DataChunk2 pageMetadata, int offset, ColumnVectorInfo vectorInfo) throws IOException, MemoryException { List encodings = pageMetadata.getEncoders(); if (CarbonUtil.isEncodedWithMeta(encodings)) { - ColumnPage decodedPage = decodeDimensionByMeta(pageMetadata, pageData, offset, - null != rawColumnPage.getLocalDictionary()); - decodedPage.setNullBits(QueryUtil.getNullBitSet(pageMetadata.presence, this.compressor)); int[] invertedIndexes = new int[0]; int[] invertedIndexesReverse = new int[0]; // in case of no dictionary measure data types, if it is included in sort columns // then inverted index to be uncompressed + boolean isExplicitSorted = + CarbonUtil.hasEncoding(pageMetadata.encoders, Encoding.INVERTED_INDEX); + int dataOffset = offset; if (encodings.contains(Encoding.INVERTED_INDEX)) { offset += pageMetadata.data_page_length; -if (CarbonUtil.hasEncoding(pageMetadata.encoders, Encoding.INVERTED_INDEX)) { +if (isExplicitSorted) { --- End diff -- This If check is not required as above If check is already checking whether it is explicit sorted or not ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r225792779 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java --- @@ -633,6 +622,56 @@ public boolean getBoolean(int rowId) { */ public abstract double getDouble(int rowId); + + + + + /** + * Get byte value at rowId + */ + public abstract byte[] getByteData(); --- End diff -- Instead of abstract method better add method with default implemnetion in this class, class where we want to provide the proper implementation we can override this method ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
GitHub user ravipesala opened a pull request: https://github.com/apache/carbondata/pull/2819 [CARBONDATA-3012] Added support for full scan queries for vector direct fill. After decompressing the page in our V3 reader we can immediately fill the data to a vector without any condition checks inside loops. So here complete column page data is set to column vector in a single batch and gives back data to Spark/Presto. For this purpose, a new method is added in `ColumnPageDecoder` ``` ColumnPage decodeAndFillVector(byte[] input, int offset, int length, ColumnVectorInfo vectorInfo, BitSet nullBits, boolean isLVEncoded) ``` The above method takes vector fill it in a single loop without any checks inside loop. And also added new method inside `DimensionDataChunkStore` ``` void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data, ColumnVectorInfo vectorInfo); ``` The above method takes vector fill it in a single loop without any checks inside loop. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata perf-full-scan Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2819.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2819 commit 358299f90df98272723f22f43ab025bd1e7fa3e8 Author: ravipesala Date: 2018-10-16T05:02:18Z Add carbon property to configure vector based row pruning push down commit 658d8cb02b657e9b5887c0348971b9d92087fab2 Author: ravipesala Date: 2018-10-16T06:00:43Z Added support for full scan queries for vector direct fill. ---