[ https://issues.apache.org/jira/browse/ARROW-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Saurabh updated ARROW-8909: --------------------------- Description: I noticed that calling setSafe on a VarCharVector with indices not in increasing order causes the lastIndex to be set to the index in the last call to setSafe. Is this a documented and expected behavior ? Sample code: {code:java} import java.util.Collections; import lombok.extern.slf4j.Slf4j; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VarCharVector; import org.apache.arrow.vector.VectorSchemaRoot; import org.apache.arrow.vector.types.pojo.ArrowType; import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.types.pojo.Schema; import org.apache.arrow.vector.util.Text; @Slf4j public class ATest { public static void main() { Schema schema = new Schema(Collections.singletonList(Field.nullable("Data", new ArrowType.Utf8()))); try (VectorSchemaRoot vroot = VectorSchemaRoot.create(schema, new RootAllocator())) { VarCharVector vec = (VarCharVector) vroot.getVector("Data"); for (int i = 0; i < 10; i++) { vec.setSafe(i, new Text(Integer.toString(i) + "_mtest")); } vec.setSafe(7, new Text(Integer.toString(7) + "_new")); log.info("Data at index 8 Before {}", vec.getObject(8)); vroot.setRowCount(10); log.info("Data at index 8 After {}", vec.getObject(8)); log.info(vroot.contentToTSVString()); } } } {code} If I don't set the index 7 after the loop, I get all the 0_mtest, 1_mtest, ..., 9_mtest entries. If I set index 7 after the loop, I see 0_mtest, ..., 5_mtest, 6_mtext, 7_new, Before the setRowCount, the data at index 8 is -> *st8_mtest* ; index 9 is *9_mtest* After the setRowCount, the data at index 8 is -> "" ; index 9 is "" With a text with more chars instead of 4 with _new, it keeps eating into the data at the following indices. was: I noticed that calling setSafe on a VarCharVector with indices not in increasing order causes the lastIndex to be set to the index in the last call to setSafe. Is this a documented and expected behavior ? Sample code: {code:java} import java.util.Collections; import lombok.extern.slf4j.Slf4j; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VarCharVector; import org.apache.arrow.vector.VectorSchemaRoot; import org.apache.arrow.vector.types.pojo.ArrowType; import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.types.pojo.Schema; import org.apache.arrow.vector.util.Text; @Slf4j public class ATest { public static void main() { Schema schema = new Schema(Collections.singletonList(Field.nullable("Data", new ArrowType.Utf8()))); try (VectorSchemaRoot vroot = VectorSchemaRoot.create(schema, new RootAllocator())) { VarCharVector vec = (VarCharVector) vroot.getVector("Data"); for (int i = 0; i < 10; i++) { vec.setSafe(i, new Text(Integer.toString(i) + "_mtest")); } // vec.setSafe(0, new Text(Integer.toString(0) + "_new")); vec.setSafe(7, new Text(Integer.toString(7) + "_new")); vroot.setRowCount(10); log.info(vroot.contentToTSVString()); } } } {code} If I don't set the 0 or 7 after the loop, I get all the 0_mtest, 1_mtest, ..., 9_mtest entries. If I set index 0 after the loop, I only see 0_new entry; other entries are "" If I set index 7 after the loop, I see 0_mtest, ..., 5_mtest, 7_new; other entries are "" > [Java] Out of order writes using setSafe > ---------------------------------------- > > Key: ARROW-8909 > URL: https://issues.apache.org/jira/browse/ARROW-8909 > Project: Apache Arrow > Issue Type: Bug > Components: Java > Reporter: Saurabh > Priority: Major > > I noticed that calling setSafe on a VarCharVector with indices not in > increasing order causes the lastIndex to be set to the index in the last call > to setSafe. > Is this a documented and expected behavior ? > Sample code: > {code:java} > import java.util.Collections; > import lombok.extern.slf4j.Slf4j; > import org.apache.arrow.memory.RootAllocator; > import org.apache.arrow.vector.VarCharVector; > import org.apache.arrow.vector.VectorSchemaRoot; > import org.apache.arrow.vector.types.pojo.ArrowType; > import org.apache.arrow.vector.types.pojo.Field; > import org.apache.arrow.vector.types.pojo.Schema; > import org.apache.arrow.vector.util.Text; > @Slf4j > public class ATest { > public static void main() { > Schema schema = new > Schema(Collections.singletonList(Field.nullable("Data", new > ArrowType.Utf8()))); > try (VectorSchemaRoot vroot = VectorSchemaRoot.create(schema, new > RootAllocator())) { > VarCharVector vec = (VarCharVector) vroot.getVector("Data"); > for (int i = 0; i < 10; i++) { > vec.setSafe(i, new Text(Integer.toString(i) + "_mtest")); > } > vec.setSafe(7, new Text(Integer.toString(7) + "_new")); > log.info("Data at index 8 Before {}", vec.getObject(8)); > vroot.setRowCount(10); > log.info("Data at index 8 After {}", vec.getObject(8)); > log.info(vroot.contentToTSVString()); > } > } > } > {code} > > If I don't set the index 7 after the loop, I get all the 0_mtest, 1_mtest, > ..., 9_mtest entries. > If I set index 7 after the loop, I see 0_mtest, ..., 5_mtest, 6_mtext, 7_new, > Before the setRowCount, the data at index 8 is -> *st8_mtest* ; index 9 > is *9_mtest* > After the setRowCount, the data at index 8 is -> "" ; index 9 is "" > With a text with more chars instead of 4 with _new, it keeps eating into the > data at the following indices. > -- This message was sent by Atlassian Jira (v8.3.4#803005)