[ https://issues.apache.org/jira/browse/ARROW-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116350#comment-17116350 ]
Liya Fan commented on ARROW-8909: --------------------------------- [~saurabhm] Thank you for reporting the problem. I think the behavior is by design. For variable width vectors, we do not support setting values in random order, as this might cause severe performance penalty. > [Java] Out of order writes using setSafe > ---------------------------------------- > > Key: ARROW-8909 > URL: https://issues.apache.org/jira/browse/ARROW-8909 > Project: Apache Arrow > Issue Type: Bug > Components: Java > Reporter: Saurabh > Priority: Major > > I noticed that calling setSafe on a VarCharVector with indices not in > increasing order causes the lastIndex to be set to the index in the last call > to setSafe. > Is this a documented and expected behavior ? > Sample code: > {code:java} > import java.util.Collections; > import lombok.extern.slf4j.Slf4j; > import org.apache.arrow.memory.RootAllocator; > import org.apache.arrow.vector.VarCharVector; > import org.apache.arrow.vector.VectorSchemaRoot; > import org.apache.arrow.vector.types.pojo.ArrowType; > import org.apache.arrow.vector.types.pojo.Field; > import org.apache.arrow.vector.types.pojo.Schema; > import org.apache.arrow.vector.util.Text; > @Slf4j > public class ATest { > public static void main() { > Schema schema = new > Schema(Collections.singletonList(Field.nullable("Data", new > ArrowType.Utf8()))); > try (VectorSchemaRoot vroot = VectorSchemaRoot.create(schema, new > RootAllocator())) { > VarCharVector vec = (VarCharVector) vroot.getVector("Data"); > for (int i = 0; i < 10; i++) { > vec.setSafe(i, new Text(Integer.toString(i) + "_mtest")); > } > vec.setSafe(7, new Text(Integer.toString(7) + "_new")); > log.info("Data at index 8 Before {}", vec.getObject(8)); > vroot.setRowCount(10); > log.info("Data at index 8 After {}", vec.getObject(8)); > log.info(vroot.contentToTSVString()); > } > } > } > {code} > > If I don't set the index 7 after the loop, I get all the 0_mtest, 1_mtest, > ..., 9_mtest entries. > If I set index 7 after the loop, I see 0_mtest, ..., 5_mtest, 6_mtext, 7_new, > Before the setRowCount, the data at index 8 is -> *st8_mtest* ; index 9 > is *9_mtest* > After the setRowCount, the data at index 8 is -> "" ; index 9 is "" > With a text with more chars instead of 4 with _new, it keeps eating into the > data at the following indices. > -- This message was sent by Atlassian Jira (v8.3.4#803005)