[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262971#comment-16262971 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on issue #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#issuecomment-346422375 > Do we want to port the missing function (getNullCount() and setRangeToOne) to the new bit vector? The new vector has `getNullCount()` just that it only counts nulls, maybe the equivalent would be `getZeroCount()` but I'm not sure how useful that really is. I think I saw there was also a `setRangeToOne` and I'm ok with adding those. > Can we remove NonNullableMapVector altogether? (git rid of the MapVector extends NonNullableMapVector inherence and roll them into a single class ) I was thinking this is what we were going to do, maybe as a followup though, so +1 for me. One thing I'm not sure of is ullability useful for this vector? For example, Spark StructType doesn't have a nullable param, it's up to the child type definitions. But maybe it's different in the Arrow sense, I'll have to think about that.. Happy Thanksgiving to you all too! You guys work too hard, I don't want to see any PRs coming through for a couple days :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262948#comment-16262948 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152630603 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BitVector.java ## @@ -18,342 +18,469 @@ package org.apache.arrow.vector; +import io.netty.buffer.ArrowBuf; import org.apache.arrow.memory.BufferAllocator; -import org.apache.arrow.memory.BaseAllocator; -import org.apache.arrow.memory.OutOfMemoryException; +import org.apache.arrow.vector.complex.impl.BitReaderImpl; import org.apache.arrow.vector.complex.reader.FieldReader; import org.apache.arrow.vector.holders.BitHolder; import org.apache.arrow.vector.holders.NullableBitHolder; -import org.apache.arrow.vector.schema.ArrowFieldNode; -import org.apache.arrow.vector.types.Types.MinorType; -import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.Types; +import org.apache.arrow.vector.types.pojo.FieldType; import org.apache.arrow.vector.util.OversizedAllocationException; import org.apache.arrow.vector.util.TransferPair; -import io.netty.buffer.ArrowBuf; - /** - * Bit implements a vector of bit-width values. Elements in the vector are accessed by position from the logical start - * of the vector. The width of each element is 1 bit. The equivalent Java primitive is an int containing the value '0' - * or '1'. + * BitVector implements a fixed width (1 bit) vector of + * boolean values which could be null. Each value in the vector corresponds + * to a single bit in the underlying data stream backing the vector. */ -public final class BitVector extends BaseDataValueVector implements FixedWidthVector { - static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BitVector.class); - - private final Accessor accessor = new Accessor(); - private final Mutator mutator = new Mutator(); - - int valueCount; - private int allocationSizeInBytes = getSizeFromCount(INITIAL_VALUE_ALLOCATION); - private int allocationMonitor = 0; +public class BitVector extends BaseFixedWidthVector { + private final FieldReader reader; + /** + * Instantiate a BitVector. This doesn't allocate any memory for + * the data in vector. + * + * @param name name of the vector + * @param allocator allocator for memory management. + */ public BitVector(String name, BufferAllocator allocator) { -super(name, allocator); +this(name, FieldType.nullable(Types.MinorType.BIT.getType()), +allocator); } - @Override - public void load(ArrowFieldNode fieldNode, ArrowBuf data) { -// When the vector is all nulls or all defined, the content of the buffer can be omitted -if (data.readableBytes() == 0 && fieldNode.getLength() != 0) { - int count = fieldNode.getLength(); - allocateNew(count); - int n = getSizeFromCount(count); - if (fieldNode.getNullCount() == 0) { -// all defined -// create an all 1s buffer -// set full bytes -int fullBytesCount = count / 8; -for (int i = 0; i < fullBytesCount; ++i) { - this.data.setByte(i, 0xFF); -} -int remainder = count % 8; -// set remaining bits -if (remainder > 0) { - byte bitMask = (byte) (0xFFL >>> ((8 - remainder) & 7)); - this.data.setByte(fullBytesCount, bitMask); -} - } else if (fieldNode.getNullCount() == fieldNode.getLength()) { -// all null -// create an all 0s buffer -zeroVector(); - } else { -throw new IllegalArgumentException("The buffer can be empty only if there's no data or it's all null or all defined"); - } - this.data.writerIndex(n); -} else { - super.load(fieldNode, data); -} -this.valueCount = fieldNode.getLength(); + /** + * Instantiate a BitVector. This doesn't allocate any memory for + * the data in vector. + * + * @param name name of the vector + * @param fieldType type of Field materialized by this vector + * @param allocator allocator for memory management. + */ + public BitVector(String name, FieldType fieldType, BufferAllocator allocator) { +super(name, allocator, fieldType, (byte) 0); +reader = new BitReaderImpl(BitVector.this); } + /** + * Get a reader that supports reading values from this vector + * + * @return Field Reader for this vector + */ @Override - public Field getField() { -throw new UnsupportedOperationException("internal vector"); + public FieldReader getReader() { +return reader; } + /** + * Get minor type for this vector. The vector holds values belonging + * to a particular type. + * + * @return
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262803#comment-16262803 ] ASF GitHub Bot commented on ARROW-1710: --- icexelloss commented on issue #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#issuecomment-346386518 @BryanCutler High level looks good to me. Two major comments: * Do we want to port the missing function (`getNullCount()` and `setRangeToOne`) to the new bit vector? * Can we remove `NonNullableMapVector` altogether? (git rid of the `MapVector extends NonNullableMapVector` inherence and roll them into a single class ) Happy thanksgiving! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262798#comment-16262798 ] ASF GitHub Bot commented on ARROW-1710: --- icexelloss commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152598542 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestOversizedAllocationForValueVector.java ## @@ -112,10 +112,10 @@ public void testVariableVectorReallocation() { try { vector.allocateNew(expectedAllocationInBytes, 10); assertTrue(expectedOffsetSize <= vector.getValueCapacity()); - assertTrue(expectedAllocationInBytes <= vector.getBuffer().capacity()); + assertTrue(expectedAllocationInBytes <= vector.getDataBuffer().capacity()); Review comment: Why this is checking DataBuffer and the check after reAlloc is checking OffsetBuffer? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262792#comment-16262792 ] ASF GitHub Bot commented on ARROW-1710: --- icexelloss commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152597645 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java ## @@ -526,15 +505,17 @@ private void validateRange(int length, int start, int count) { try (BitVector bitVector = new BitVector("bits", allocator)) { bitVector.reset(); bitVector.allocateNew(length); - bitVector.getMutator().setRangeToOne(start, count); + for (int i = start; i < start + count; i++) { +bitVector.set(i, 1); + } Review comment: Should we consider add `setRangeToOne` back in `BitVector`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262788#comment-16262788 ] ASF GitHub Bot commented on ARROW-1710: --- icexelloss commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152596476 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java ## @@ -426,81 +405,81 @@ public void testReallocAfterVectorTransfer2() { public void testBitVector() { // Create a new value vector for 1024 integers try (final BitVector vector = new BitVector(EMPTY_SCHEMA_PATH, allocator)) { - final BitVector.Mutator m = vector.getMutator(); vector.allocateNew(1024); - m.setValueCount(1024); + vector.setValueCount(1024); // Put and set a few values - m.set(0, 1); - m.set(1, 0); - m.set(100, 0); - m.set(1022, 1); + vector.set(0, 1); + vector.set(1, 0); + vector.set(100, 0); + vector.set(1022, 1); - m.setValueCount(1024); + vector.setValueCount(1024); - final BitVector.Accessor accessor = vector.getAccessor(); - assertEquals(1, accessor.get(0)); - assertEquals(0, accessor.get(1)); - assertEquals(0, accessor.get(100)); - assertEquals(1, accessor.get(1022)); + assertEquals(1, vector.get(0)); + assertEquals(0, vector.get(1)); + assertEquals(0, vector.get(100)); + assertEquals(1, vector.get(1022)); - assertEquals(1022, accessor.getNullCount()); + assertEquals(1020, vector.getNullCount()); Review comment: For migration purpose, should me provide an alternative method to provide the old behavior? Something like `getNullAndUnsetCount()` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262789#comment-16262789 ] ASF GitHub Bot commented on ARROW-1710: --- icexelloss commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152596476 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java ## @@ -426,81 +405,81 @@ public void testReallocAfterVectorTransfer2() { public void testBitVector() { // Create a new value vector for 1024 integers try (final BitVector vector = new BitVector(EMPTY_SCHEMA_PATH, allocator)) { - final BitVector.Mutator m = vector.getMutator(); vector.allocateNew(1024); - m.setValueCount(1024); + vector.setValueCount(1024); // Put and set a few values - m.set(0, 1); - m.set(1, 0); - m.set(100, 0); - m.set(1022, 1); + vector.set(0, 1); + vector.set(1, 0); + vector.set(100, 0); + vector.set(1022, 1); - m.setValueCount(1024); + vector.setValueCount(1024); - final BitVector.Accessor accessor = vector.getAccessor(); - assertEquals(1, accessor.get(0)); - assertEquals(0, accessor.get(1)); - assertEquals(0, accessor.get(100)); - assertEquals(1, accessor.get(1022)); + assertEquals(1, vector.get(0)); + assertEquals(0, vector.get(1)); + assertEquals(0, vector.get(100)); + assertEquals(1, vector.get(1022)); - assertEquals(1022, accessor.getNullCount()); + assertEquals(1020, vector.getNullCount()); Review comment: For migration purpose, should me provide an alternative method to provide the old behavior? Something like `getNullAndUnsetCount()`. Seems like a weird API though. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262773#comment-16262773 ] ASF GitHub Bot commented on ARROW-1710: --- icexelloss commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152593713 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BitVector.java ## @@ -18,342 +18,469 @@ package org.apache.arrow.vector; +import io.netty.buffer.ArrowBuf; import org.apache.arrow.memory.BufferAllocator; -import org.apache.arrow.memory.BaseAllocator; -import org.apache.arrow.memory.OutOfMemoryException; +import org.apache.arrow.vector.complex.impl.BitReaderImpl; import org.apache.arrow.vector.complex.reader.FieldReader; import org.apache.arrow.vector.holders.BitHolder; import org.apache.arrow.vector.holders.NullableBitHolder; -import org.apache.arrow.vector.schema.ArrowFieldNode; -import org.apache.arrow.vector.types.Types.MinorType; -import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.Types; +import org.apache.arrow.vector.types.pojo.FieldType; import org.apache.arrow.vector.util.OversizedAllocationException; import org.apache.arrow.vector.util.TransferPair; -import io.netty.buffer.ArrowBuf; - /** - * Bit implements a vector of bit-width values. Elements in the vector are accessed by position from the logical start - * of the vector. The width of each element is 1 bit. The equivalent Java primitive is an int containing the value '0' - * or '1'. + * BitVector implements a fixed width (1 bit) vector of + * boolean values which could be null. Each value in the vector corresponds + * to a single bit in the underlying data stream backing the vector. */ -public final class BitVector extends BaseDataValueVector implements FixedWidthVector { - static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BitVector.class); - - private final Accessor accessor = new Accessor(); - private final Mutator mutator = new Mutator(); - - int valueCount; - private int allocationSizeInBytes = getSizeFromCount(INITIAL_VALUE_ALLOCATION); - private int allocationMonitor = 0; +public class BitVector extends BaseFixedWidthVector { + private final FieldReader reader; + /** + * Instantiate a BitVector. This doesn't allocate any memory for + * the data in vector. + * + * @param name name of the vector + * @param allocator allocator for memory management. + */ public BitVector(String name, BufferAllocator allocator) { -super(name, allocator); +this(name, FieldType.nullable(Types.MinorType.BIT.getType()), +allocator); } - @Override - public void load(ArrowFieldNode fieldNode, ArrowBuf data) { -// When the vector is all nulls or all defined, the content of the buffer can be omitted -if (data.readableBytes() == 0 && fieldNode.getLength() != 0) { - int count = fieldNode.getLength(); - allocateNew(count); - int n = getSizeFromCount(count); - if (fieldNode.getNullCount() == 0) { -// all defined -// create an all 1s buffer -// set full bytes -int fullBytesCount = count / 8; -for (int i = 0; i < fullBytesCount; ++i) { - this.data.setByte(i, 0xFF); -} -int remainder = count % 8; -// set remaining bits -if (remainder > 0) { - byte bitMask = (byte) (0xFFL >>> ((8 - remainder) & 7)); - this.data.setByte(fullBytesCount, bitMask); -} - } else if (fieldNode.getNullCount() == fieldNode.getLength()) { -// all null -// create an all 0s buffer -zeroVector(); - } else { -throw new IllegalArgumentException("The buffer can be empty only if there's no data or it's all null or all defined"); - } - this.data.writerIndex(n); -} else { - super.load(fieldNode, data); -} -this.valueCount = fieldNode.getLength(); + /** + * Instantiate a BitVector. This doesn't allocate any memory for + * the data in vector. + * + * @param name name of the vector + * @param fieldType type of Field materialized by this vector + * @param allocator allocator for memory management. + */ + public BitVector(String name, FieldType fieldType, BufferAllocator allocator) { +super(name, allocator, fieldType, (byte) 0); +reader = new BitReaderImpl(BitVector.this); } + /** + * Get a reader that supports reading values from this vector + * + * @return Field Reader for this vector + */ @Override - public Field getField() { -throw new UnsupportedOperationException("internal vector"); + public FieldReader getReader() { +return reader; } + /** + * Get minor type for this vector. The vector holds values belonging + * to a particular type. + * + * @return
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261761#comment-16261761 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152441194 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java ## @@ -21,332 +21,492 @@ import static com.google.common.base.Preconditions.checkNotNull; import java.util.ArrayList; -import java.util.Collection; -import java.util.Iterator; +import java.util.Arrays; +import java.util.Collections; import java.util.List; -import java.util.Map; -import javax.annotation.Nullable; - -import com.google.common.base.Preconditions; -import com.google.common.collect.Ordering; -import com.google.common.primitives.Ints; +import com.google.common.collect.ObjectArrays; import io.netty.buffer.ArrowBuf; - +import org.apache.arrow.memory.BaseAllocator; import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.vector.*; -import org.apache.arrow.vector.complex.impl.SingleMapReaderImpl; -import org.apache.arrow.vector.complex.reader.FieldReader; +import org.apache.arrow.vector.complex.impl.NullableMapReaderImpl; +import org.apache.arrow.vector.complex.impl.NullableMapWriter; import org.apache.arrow.vector.holders.ComplexHolder; -import org.apache.arrow.vector.types.Types.MinorType; +import org.apache.arrow.vector.schema.ArrowFieldNode; import org.apache.arrow.vector.types.pojo.ArrowType; -import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.ArrowType.Struct; +import org.apache.arrow.vector.types.pojo.DictionaryEncoding; import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.util.CallBack; -import org.apache.arrow.vector.util.JsonStringHashMap; +import org.apache.arrow.vector.util.OversizedAllocationException; import org.apache.arrow.vector.util.TransferPair; -public class MapVector extends AbstractMapVector { - //private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(MapVector.class); +public class MapVector extends NonNullableMapVector implements FieldVector { Review comment: Yes, the above is correct and what I was referencing in (1) from https://github.com/apache/arrow/pull/1341#issuecomment-345871911. Maybe I missed the discussion, so I thought we would be removing the `NonNullableMapVector` and would combine the 2 classes. Why do we need to keep both? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261754#comment-16261754 ] ASF GitHub Bot commented on ARROW-1710: --- siddharthteotia commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152440144 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java ## @@ -21,332 +21,492 @@ import static com.google.common.base.Preconditions.checkNotNull; import java.util.ArrayList; -import java.util.Collection; -import java.util.Iterator; +import java.util.Arrays; +import java.util.Collections; import java.util.List; -import java.util.Map; -import javax.annotation.Nullable; - -import com.google.common.base.Preconditions; -import com.google.common.collect.Ordering; -import com.google.common.primitives.Ints; +import com.google.common.collect.ObjectArrays; import io.netty.buffer.ArrowBuf; - +import org.apache.arrow.memory.BaseAllocator; import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.vector.*; -import org.apache.arrow.vector.complex.impl.SingleMapReaderImpl; -import org.apache.arrow.vector.complex.reader.FieldReader; +import org.apache.arrow.vector.complex.impl.NullableMapReaderImpl; +import org.apache.arrow.vector.complex.impl.NullableMapWriter; import org.apache.arrow.vector.holders.ComplexHolder; -import org.apache.arrow.vector.types.Types.MinorType; +import org.apache.arrow.vector.schema.ArrowFieldNode; import org.apache.arrow.vector.types.pojo.ArrowType; -import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.ArrowType.Struct; +import org.apache.arrow.vector.types.pojo.DictionaryEncoding; import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.util.CallBack; -import org.apache.arrow.vector.util.JsonStringHashMap; +import org.apache.arrow.vector.util.OversizedAllocationException; import org.apache.arrow.vector.util.TransferPair; -public class MapVector extends AbstractMapVector { - //private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(MapVector.class); +public class MapVector extends NonNullableMapVector implements FieldVector { Review comment: This is slightly different from changing the name of NullableInt to Int since we are going to remove the latter one. For the MapVector case, I don't think so we are removing anything and instead keeping around both -- base class MapVector and subclass NullableMapVector. So probably no need to change name here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261744#comment-16261744 ] ASF GitHub Bot commented on ARROW-1710: --- siddharthteotia commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152439861 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java ## @@ -21,332 +21,492 @@ import static com.google.common.base.Preconditions.checkNotNull; import java.util.ArrayList; -import java.util.Collection; -import java.util.Iterator; +import java.util.Arrays; +import java.util.Collections; import java.util.List; -import java.util.Map; -import javax.annotation.Nullable; - -import com.google.common.base.Preconditions; -import com.google.common.collect.Ordering; -import com.google.common.primitives.Ints; +import com.google.common.collect.ObjectArrays; import io.netty.buffer.ArrowBuf; - +import org.apache.arrow.memory.BaseAllocator; import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.vector.*; -import org.apache.arrow.vector.complex.impl.SingleMapReaderImpl; -import org.apache.arrow.vector.complex.reader.FieldReader; +import org.apache.arrow.vector.complex.impl.NullableMapReaderImpl; +import org.apache.arrow.vector.complex.impl.NullableMapWriter; import org.apache.arrow.vector.holders.ComplexHolder; -import org.apache.arrow.vector.types.Types.MinorType; +import org.apache.arrow.vector.schema.ArrowFieldNode; import org.apache.arrow.vector.types.pojo.ArrowType; -import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.ArrowType.Struct; +import org.apache.arrow.vector.types.pojo.DictionaryEncoding; import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.util.CallBack; -import org.apache.arrow.vector.util.JsonStringHashMap; +import org.apache.arrow.vector.util.OversizedAllocationException; import org.apache.arrow.vector.util.TransferPair; -public class MapVector extends AbstractMapVector { - //private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(MapVector.class); +public class MapVector extends NonNullableMapVector implements FieldVector { Review comment: If I understand it correctly, the original MapVector has now become NonNullableMapVector and NullableMapVector (subclass of of MapVector) has now become MapVector. @BryanCutler can confirm. I would prefer to not change the naming here since there are essentially two classes -- NullableMapVector and MapVector where the former one is a MapVector with a validity buffer. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261736#comment-16261736 ] ASF GitHub Bot commented on ARROW-1710: --- icexelloss commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152439296 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java ## @@ -21,332 +21,492 @@ import static com.google.common.base.Preconditions.checkNotNull; import java.util.ArrayList; -import java.util.Collection; -import java.util.Iterator; +import java.util.Arrays; +import java.util.Collections; import java.util.List; -import java.util.Map; -import javax.annotation.Nullable; - -import com.google.common.base.Preconditions; -import com.google.common.collect.Ordering; -import com.google.common.primitives.Ints; +import com.google.common.collect.ObjectArrays; import io.netty.buffer.ArrowBuf; - +import org.apache.arrow.memory.BaseAllocator; import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.vector.*; -import org.apache.arrow.vector.complex.impl.SingleMapReaderImpl; -import org.apache.arrow.vector.complex.reader.FieldReader; +import org.apache.arrow.vector.complex.impl.NullableMapReaderImpl; +import org.apache.arrow.vector.complex.impl.NullableMapWriter; import org.apache.arrow.vector.holders.ComplexHolder; -import org.apache.arrow.vector.types.Types.MinorType; +import org.apache.arrow.vector.schema.ArrowFieldNode; import org.apache.arrow.vector.types.pojo.ArrowType; -import org.apache.arrow.vector.types.pojo.Field; +import org.apache.arrow.vector.types.pojo.ArrowType.Struct; +import org.apache.arrow.vector.types.pojo.DictionaryEncoding; import org.apache.arrow.vector.types.pojo.FieldType; +import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.util.CallBack; -import org.apache.arrow.vector.util.JsonStringHashMap; +import org.apache.arrow.vector.util.OversizedAllocationException; import org.apache.arrow.vector.util.TransferPair; -public class MapVector extends AbstractMapVector { - //private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(MapVector.class); +public class MapVector extends NonNullableMapVector implements FieldVector { Review comment: Is `NonNullableMapVector` something new? Can you explain the change of class hierarchy here? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261733#comment-16261733 ] ASF GitHub Bot commented on ARROW-1710: --- icexelloss commented on issue #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#issuecomment-346203057 @BryanCutler Good work! High level looks good. I will try to look more closely tomorrow. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261330#comment-16261330 ] ASF GitHub Bot commented on ARROW-1710: --- icexelloss commented on issue #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#issuecomment-346132977 If this is merged first. I would just delete different files, shouldn't be hard. If #1330 get merged first, you need to rename NullableTimestampVector -> TimestampVector. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261324#comment-16261324 ] ASF GitHub Bot commented on ARROW-1710: --- icexelloss commented on issue #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#issuecomment-346132669 @BryanCutler There might be some conflicts but I think we should be fine. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261308#comment-16261308 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on issue #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#issuecomment-346128441 @icexelloss I didn't see you were already working on #1330, which is also a sizable change, before I started on this. I'm not sure which one would be easier to merge first, any thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261273#comment-16261273 ] ASF GitHub Bot commented on ARROW-1710: --- icexelloss commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152368299 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java ## @@ -426,81 +405,81 @@ public void testReallocAfterVectorTransfer2() { public void testBitVector() { // Create a new value vector for 1024 integers try (final BitVector vector = new BitVector(EMPTY_SCHEMA_PATH, allocator)) { - final BitVector.Mutator m = vector.getMutator(); vector.allocateNew(1024); - m.setValueCount(1024); + vector.setValueCount(1024); // Put and set a few values - m.set(0, 1); - m.set(1, 0); - m.set(100, 0); - m.set(1022, 1); + vector.set(0, 1); + vector.set(1, 0); + vector.set(100, 0); + vector.set(1022, 1); - m.setValueCount(1024); + vector.setValueCount(1024); - final BitVector.Accessor accessor = vector.getAccessor(); - assertEquals(1, accessor.get(0)); - assertEquals(0, accessor.get(1)); - assertEquals(0, accessor.get(100)); - assertEquals(1, accessor.get(1022)); + assertEquals(1, vector.get(0)); + assertEquals(0, vector.get(1)); + assertEquals(0, vector.get(100)); + assertEquals(1, vector.get(1022)); - assertEquals(1022, accessor.getNullCount()); + assertEquals(1020, vector.getNullCount()); Review comment: I agree this is an misnomer. I think this is fine as long as it doesn't break Dremio. cc @siddharthteotia. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261250#comment-16261250 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152364883 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java ## @@ -426,81 +405,81 @@ public void testReallocAfterVectorTransfer2() { public void testBitVector() { // Create a new value vector for 1024 integers try (final BitVector vector = new BitVector(EMPTY_SCHEMA_PATH, allocator)) { - final BitVector.Mutator m = vector.getMutator(); vector.allocateNew(1024); - m.setValueCount(1024); + vector.setValueCount(1024); // Put and set a few values - m.set(0, 1); - m.set(1, 0); - m.set(100, 0); - m.set(1022, 1); + vector.set(0, 1); + vector.set(1, 0); + vector.set(100, 0); + vector.set(1022, 1); - m.setValueCount(1024); + vector.setValueCount(1024); - final BitVector.Accessor accessor = vector.getAccessor(); - assertEquals(1, accessor.get(0)); - assertEquals(0, accessor.get(1)); - assertEquals(0, accessor.get(100)); - assertEquals(1, accessor.get(1022)); + assertEquals(1, vector.get(0)); + assertEquals(0, vector.get(1)); + assertEquals(0, vector.get(100)); + assertEquals(1, vector.get(1022)); - assertEquals(1022, accessor.getNullCount()); + assertEquals(1020, vector.getNullCount()); Review comment: Yeah that would make sense, but sort of a misnomer imo. Should we include this change in the release notes somewhere? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260049#comment-16260049 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on issue #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#issuecomment-345872196 Given the scope of this, I would be nice to get it and resolve the above issues later as long as this doesn't break anything - looks like I already need to do a rebase.. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260045#comment-16260045 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on issue #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#issuecomment-345871911 @siddharthteotia @icexelloss please take a look when you have a chance. There are still some things I was not sure of so I marked this as a WIP: 1) MapVector still has a non-nullable vector class in it's hierarchy. Maybe I missed some previous discussion from the refactoring, but I couldn't just remove the non-nullable class so I renamed them to be consistent with the other vectors 2) Holders have both nullable and non-nullable, not sure if there is still a use for that so I left it for now... maybe after ARROW-1833 we can fix this up 3) There are still references to NullableMapWriter/Reader, that got a little messy trying to remove but I can try again later cc @jacques-n @wesm This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260040#comment-16260040 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152144518 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestVectorReAlloc.java ## @@ -108,7 +107,8 @@ public void testListType() { assertEquals(1023, vector.getValueCapacity()); try { -vector.getOffsetVector().getAccessor().get(2014); +// TODO: is this right? +vector.getInnerValueCountAt(2014); Review comment: Is this what the test is checking for? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260037#comment-16260037 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152143688 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestOversizedAllocationForValueVector.java ## @@ -112,10 +112,10 @@ public void testVariableVectorReallocation() { try { vector.allocateNew(expectedAllocationInBytes, 10); assertTrue(expectedOffsetSize <= vector.getValueCapacity()); - assertTrue(expectedAllocationInBytes <= vector.getBuffer().capacity()); + assertTrue(expectedAllocationInBytes <= vector.getDataBuffer().capacity()); Review comment: Is this correct? it is checking only the data buffer capacity i believe This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260035#comment-16260035 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152143477 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java ## @@ -526,15 +505,17 @@ private void validateRange(int length, int start, int count) { try (BitVector bitVector = new BitVector("bits", allocator)) { bitVector.reset(); bitVector.allocateNew(length); - bitVector.getMutator().setRangeToOne(start, count); + for (int i = start; i < start + count; i++) { +bitVector.set(i, 1); + } Review comment: again, slightly different API, need to check range manually This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260033#comment-16260033 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152143196 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java ## @@ -426,81 +405,81 @@ public void testReallocAfterVectorTransfer2() { public void testBitVector() { // Create a new value vector for 1024 integers try (final BitVector vector = new BitVector(EMPTY_SCHEMA_PATH, allocator)) { - final BitVector.Mutator m = vector.getMutator(); vector.allocateNew(1024); - m.setValueCount(1024); + vector.setValueCount(1024); // Put and set a few values - m.set(0, 1); - m.set(1, 0); - m.set(100, 0); - m.set(1022, 1); + vector.set(0, 1); + vector.set(1, 0); + vector.set(100, 0); + vector.set(1022, 1); - m.setValueCount(1024); + vector.setValueCount(1024); - final BitVector.Accessor accessor = vector.getAccessor(); - assertEquals(1, accessor.get(0)); - assertEquals(0, accessor.get(1)); - assertEquals(0, accessor.get(100)); - assertEquals(1, accessor.get(1022)); + assertEquals(1, vector.get(0)); + assertEquals(0, vector.get(1)); + assertEquals(0, vector.get(100)); + assertEquals(1, vector.get(1022)); - assertEquals(1022, accessor.getNullCount()); + assertEquals(1020, vector.getNullCount()); // test setting the same value twice - m.set(0, 1); - m.set(0, 1); - m.set(1, 0); - m.set(1, 0); - assertEquals(1, accessor.get(0)); - assertEquals(0, accessor.get(1)); + vector.set(0, 1); + vector.set(0, 1); + vector.set(1, 0); + vector.set(1, 0); + assertEquals(1, vector.get(0)); + assertEquals(0, vector.get(1)); // test toggling the values - m.set(0, 0); - m.set(1, 1); - assertEquals(0, accessor.get(0)); - assertEquals(1, accessor.get(1)); + vector.set(0, 0); + vector.set(1, 1); + assertEquals(0, vector.get(0)); + assertEquals(1, vector.get(1)); // should not change - assertEquals(1022, accessor.getNullCount()); + assertEquals(1020, vector.getNullCount()); - // Ensure unallocated space returns 0 - assertEquals(0, accessor.get(3)); + // Ensure null value + assertTrue(vector.isNull(3)); Review comment: can't check that a null entry has any type of value This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260030#comment-16260030 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152142911 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java ## @@ -426,81 +405,81 @@ public void testReallocAfterVectorTransfer2() { public void testBitVector() { // Create a new value vector for 1024 integers try (final BitVector vector = new BitVector(EMPTY_SCHEMA_PATH, allocator)) { - final BitVector.Mutator m = vector.getMutator(); vector.allocateNew(1024); - m.setValueCount(1024); + vector.setValueCount(1024); // Put and set a few values - m.set(0, 1); - m.set(1, 0); - m.set(100, 0); - m.set(1022, 1); + vector.set(0, 1); + vector.set(1, 0); + vector.set(100, 0); + vector.set(1022, 1); - m.setValueCount(1024); + vector.setValueCount(1024); - final BitVector.Accessor accessor = vector.getAccessor(); - assertEquals(1, accessor.get(0)); - assertEquals(0, accessor.get(1)); - assertEquals(0, accessor.get(100)); - assertEquals(1, accessor.get(1022)); + assertEquals(1, vector.get(0)); + assertEquals(0, vector.get(1)); + assertEquals(0, vector.get(100)); + assertEquals(1, vector.get(1022)); - assertEquals(1022, accessor.getNullCount()); + assertEquals(1020, vector.getNullCount()); Review comment: The value checked here seem wrong before, am I missing something? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260031#comment-16260031 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152143001 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java ## @@ -426,81 +405,81 @@ public void testReallocAfterVectorTransfer2() { public void testBitVector() { // Create a new value vector for 1024 integers try (final BitVector vector = new BitVector(EMPTY_SCHEMA_PATH, allocator)) { - final BitVector.Mutator m = vector.getMutator(); vector.allocateNew(1024); - m.setValueCount(1024); + vector.setValueCount(1024); // Put and set a few values - m.set(0, 1); - m.set(1, 0); - m.set(100, 0); - m.set(1022, 1); + vector.set(0, 1); + vector.set(1, 0); + vector.set(100, 0); + vector.set(1022, 1); - m.setValueCount(1024); + vector.setValueCount(1024); - final BitVector.Accessor accessor = vector.getAccessor(); - assertEquals(1, accessor.get(0)); - assertEquals(0, accessor.get(1)); - assertEquals(0, accessor.get(100)); - assertEquals(1, accessor.get(1022)); + assertEquals(1, vector.get(0)); + assertEquals(0, vector.get(1)); + assertEquals(0, vector.get(100)); + assertEquals(1, vector.get(1022)); - assertEquals(1022, accessor.getNullCount()); + assertEquals(1020, vector.getNullCount()); // test setting the same value twice - m.set(0, 1); - m.set(0, 1); - m.set(1, 0); - m.set(1, 0); - assertEquals(1, accessor.get(0)); - assertEquals(0, accessor.get(1)); + vector.set(0, 1); + vector.set(0, 1); + vector.set(1, 0); + vector.set(1, 0); + assertEquals(1, vector.get(0)); + assertEquals(0, vector.get(1)); // test toggling the values - m.set(0, 0); - m.set(1, 1); - assertEquals(0, accessor.get(0)); - assertEquals(1, accessor.get(1)); + vector.set(0, 0); + vector.set(1, 1); + assertEquals(0, vector.get(0)); + assertEquals(1, vector.get(1)); // should not change - assertEquals(1022, accessor.getNullCount()); + assertEquals(1020, vector.getNullCount()); Review comment: again, this value seemed wrong before This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260028#comment-16260028 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152142654 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java ## @@ -242,90 +230,81 @@ public void testReallocAfterVectorTransfer1() { int valueCapacity = vector.getValueCapacity(); assertEquals(4096, valueCapacity); - final BitVector.Mutator mutator = vector.getMutator(); - final BitVector.Accessor accessor = vector.getAccessor(); - for (int i = 0; i < valueCapacity; i++) { if ((i & 1) == 1) { - mutator.setToOne(i); + vector.set(i, 1); } } for (int i = 0; i < valueCapacity; i++) { -int val = accessor.get(i); if ((i & 1) == 1) { - assertEquals("unexpected cleared bit at index: " + i, 1, val); + assertEquals("unexpected cleared bit at index: " + i, 1, vector.get(i)); } else { - assertEquals("unexpected set bit at index: " + i, 0, val); + assertTrue("unexpected set bit at index: " + i, vector.isNull(i)); Review comment: since it's now a nullable `BitVector`, can't get values where a null is set This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260026#comment-16260026 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341#discussion_r152142514 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java ## @@ -242,90 +230,81 @@ public void testReallocAfterVectorTransfer1() { int valueCapacity = vector.getValueCapacity(); assertEquals(4096, valueCapacity); - final BitVector.Mutator mutator = vector.getMutator(); - final BitVector.Accessor accessor = vector.getAccessor(); - for (int i = 0; i < valueCapacity; i++) { if ((i & 1) == 1) { - mutator.setToOne(i); + vector.set(i, 1); Review comment: Slightly different API with the new BitVector This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260016#comment-16260016 ] ASF GitHub Bot commented on ARROW-1710: --- BryanCutler opened a new pull request #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors URL: https://github.com/apache/arrow/pull/1341 This removes non-nullable vectors that are no longer part of the vector class hierarchy and renames Nullable*Vector classes to remove the Nullable prefix. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258884#comment-16258884 ] Bryan Cutler commented on ARROW-1710: - Sounds good to me, I'll work on this and have a PR soon. > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258231#comment-16258231 ] Wes McKinney commented on ARROW-1710: - +1. See ARROW-1833 > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258123#comment-16258123 ] Jacques Nadeau commented on ARROW-1710: --- Agree to both nullable prefix removable and adding "dirty" accessor/mutator methods but i think the latter could come in 0.9.0 since it is enhancement to the api. > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257872#comment-16257872 ] Wes McKinney commented on ARROW-1710: - I would also propose to remove the Nullable prefix and add "dirty" accessor methods for users who are working with data without nulls (or that can be used on the hot path when you see that the null count for a vector is 0) > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242687#comment-16242687 ] Li Jin commented on ARROW-1710: --- Seems we agree to remove non nullable vectors. The next question is, what do people feel about dropping the "Nullable" prefix in new vector classes? [~bryanc] brought this up initially. I am +1 for dropping the "Nullable" prefix. It makes the code more concise. > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220962#comment-16220962 ] Jacques Nadeau commented on ARROW-1710: --- The consolidation doesn't inline the values. They are simply prepended. All validity bits come before all data bytes. They are just in the same ArrowBuf rather than maintaining two ArrowBufs. Normally we use different buffers to support resizing only one. However, in the case of validity and data, both will get resized at the same time depending on the values held. > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220392#comment-16220392 ] Ethan Levine commented on ARROW-1710: - [~jnadeau]: I'd be interested to learn more about that consolidation. It sounds like the validity bits will be stored inline with the data? It seems like eliding that could be difficult. Our use case involves data that's mostly not nullable, and we take care to ensure that it's declared that way. If the costs of writing only non-null values to a nullable array (in terms of memory and computation) become insignificant, then it makes sense to only include nullable arrays. But if that's not the case then I think it makes sense to keep both. > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219836#comment-16219836 ] Jacques Nadeau commented on ARROW-1710: --- I'm one of the voices strongly arguing for dropping the additional class objects. (I also was the one who originally introduced the two separate sets when the code was first developed.) My experience has been the following: * Extra complexity of managing two different runtime classes is very expensive (maintenance, coercing between, managing runtime code generation, etc) * Most source data is actually declared as nullable but rarely has nulls As such, having an adaptive interaction where you look at cells 64 values at a time and adapt your behavior based on actual nullability (as opposed to declared nullability) provides a much better performance lift in real world use cases than having specialized code for declared non-nullable situations. FYI: [~e.levine], the updated approach with vectors is moving to a situation where we don't have a bit vector and ultimately also consolidates the buffer for the bits and the fixed bytes in the same buffer. In that case, there is no heap memory overhead and the direct memory overhead is 1 bit per value, far less than necessary. Also note that in reality, most people focused on super high performance Java implementations interact directly with the memory. You can see an example of how we do this here: https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/common/ht2/Pivots.java#L89 If, in the future, if people need the vector classes to have an additional set of methods such as: allocateNewNoNull() setSafeIgnoreNull(int index, int value) let's just add those when someone's usecase requires it. No need to have an extra set of vectors for that purpose. > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218606#comment-16218606 ] Wes McKinney commented on ARROW-1710: - See https://github.com/apache/arrow/blob/master/format/Layout.md#null-bitmaps. "Arrays having a 0 null count may choose to not allocate the null bitmap.". So when there are no nulls, it is not necessary to create a BitVector. It is also not necessary to populate the bit vector, so as you say waiting until the first null to create the bitmap might be the way to go. > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217560#comment-16217560 ] Ethan Levine commented on ARROW-1710: - The BitVector is an extra object that has to be allocated (both in terms of the backing data and in terms of the Java objects involved). You'd also need to perform bit masking of the underlying data with every write, which could involve a cache miss if the data for the BitVector isn't neatly colocated with the actual data for the nullable vector. Perhaps a tracking flag could be added to the nullable vectors, though. It would start out "false", and get set to "true" if you ever write a null value. That way you could avoid the extra allocation and computation involved with tracking the validity of each value in the case where there are no null values. This seems like it would be more complicated than just keeping non-nullable vectors around, however. > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214748#comment-16214748 ] Gonzalo Ortiz commented on ARROW-1710: -- I'm a little bit concern about performance loss if non-nullable vectors are removed. Arrow on Java has quite bad performance so far (compared with plan ByteBuffers) and to remove non-nullable vectors can make it even worst. > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)