[ 
https://issues.apache.org/jira/browse/ARROW-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh updated ARROW-8909:
---------------------------
    Description: 
I noticed that calling setSafe on a VarCharVector with indices not in 
increasing order causes the lastIndex to be set to the index in the last call 
to setSafe.

Is this a documented and expected behavior ?

Sample code:
{code:java}
import java.util.Collections;
import lombok.extern.slf4j.Slf4j;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.VarCharVector;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.Schema;
import org.apache.arrow.vector.util.Text;

@Slf4j
public class ATest {

  public static void main() {
    Schema schema = new Schema(Collections.singletonList(Field.nullable("Data", 
new ArrowType.Utf8())));
    try (VectorSchemaRoot vroot = VectorSchemaRoot.create(schema, new 
RootAllocator())) {
      VarCharVector vec = (VarCharVector) vroot.getVector("Data");

      for (int i = 0; i < 10; i++) {
        vec.setSafe(i, new Text(Integer.toString(i) + "_mtest"));
      }

      vec.setSafe(7, new Text(Integer.toString(7) + "_new"));

      log.info("Data at index 8 Before {}", vec.getObject(8));
      vroot.setRowCount(10);
      log.info("Data at index 8 After {}", vec.getObject(8));
      log.info(vroot.contentToTSVString());
    }
  }
}
{code}
 

If I don't set the index 7 after the loop, I get all the 0_mtest, 1_mtest, ..., 
9_mtest entries.

If I set index 7 after the loop, I see 0_mtest, ..., 5_mtest, 6_mtext, 7_new,
    Before the setRowCount, the data at index 8 is -> *st8_mtest*  ; index 9 is 
*9_mtest*
   After the setRowCount, the data at index 8 is -> "" ; index  9 is ""

With a text with more chars instead of 4 with _new, it keeps eating into the 
data at the following indices.

 

  was:
I noticed that calling setSafe on a VarCharVector with indices not in 
increasing order causes the lastIndex to be set to the index in the last call 
to setSafe.

Is this a documented and expected behavior ?

Sample code:
{code:java}
import java.util.Collections;
import lombok.extern.slf4j.Slf4j;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.VarCharVector;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.Schema;
import org.apache.arrow.vector.util.Text;

@Slf4j
public class ATest {

  public static void main() {
    Schema schema = new Schema(Collections.singletonList(Field.nullable("Data", 
new ArrowType.Utf8())));
    try (VectorSchemaRoot vroot = VectorSchemaRoot.create(schema, new 
RootAllocator())) {
      VarCharVector vec = (VarCharVector) vroot.getVector("Data");

      for (int i = 0; i < 10; i++) {
        vec.setSafe(i, new Text(Integer.toString(i) + "_mtest"));
      }
      // vec.setSafe(0, new Text(Integer.toString(0) + "_new"));
      vec.setSafe(7, new Text(Integer.toString(7) + "_new"));

      vroot.setRowCount(10);
      log.info(vroot.contentToTSVString());
    }
  }
}
{code}
 

If I don't set the 0 or 7 after the loop, I get all the 0_mtest, 1_mtest, ..., 
9_mtest entries.

If I set index 0 after the loop, I only see 0_new entry; other entries are ""

If I set index 7 after the loop, I see 0_mtest, ..., 5_mtest, 7_new; other 
entries are ""

 


> [Java] Out of order writes using setSafe
> ----------------------------------------
>
>                 Key: ARROW-8909
>                 URL: https://issues.apache.org/jira/browse/ARROW-8909
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>            Reporter: Saurabh
>            Priority: Major
>
> I noticed that calling setSafe on a VarCharVector with indices not in 
> increasing order causes the lastIndex to be set to the index in the last call 
> to setSafe.
> Is this a documented and expected behavior ?
> Sample code:
> {code:java}
> import java.util.Collections;
> import lombok.extern.slf4j.Slf4j;
> import org.apache.arrow.memory.RootAllocator;
> import org.apache.arrow.vector.VarCharVector;
> import org.apache.arrow.vector.VectorSchemaRoot;
> import org.apache.arrow.vector.types.pojo.ArrowType;
> import org.apache.arrow.vector.types.pojo.Field;
> import org.apache.arrow.vector.types.pojo.Schema;
> import org.apache.arrow.vector.util.Text;
> @Slf4j
> public class ATest {
>   public static void main() {
>     Schema schema = new 
> Schema(Collections.singletonList(Field.nullable("Data", new 
> ArrowType.Utf8())));
>     try (VectorSchemaRoot vroot = VectorSchemaRoot.create(schema, new 
> RootAllocator())) {
>       VarCharVector vec = (VarCharVector) vroot.getVector("Data");
>       for (int i = 0; i < 10; i++) {
>         vec.setSafe(i, new Text(Integer.toString(i) + "_mtest"));
>       }
>       vec.setSafe(7, new Text(Integer.toString(7) + "_new"));
>       log.info("Data at index 8 Before {}", vec.getObject(8));
>       vroot.setRowCount(10);
>       log.info("Data at index 8 After {}", vec.getObject(8));
>       log.info(vroot.contentToTSVString());
>     }
>   }
> }
> {code}
>  
> If I don't set the index 7 after the loop, I get all the 0_mtest, 1_mtest, 
> ..., 9_mtest entries.
> If I set index 7 after the loop, I see 0_mtest, ..., 5_mtest, 6_mtext, 7_new,
>     Before the setRowCount, the data at index 8 is -> *st8_mtest*  ; index 9 
> is *9_mtest*
>    After the setRowCount, the data at index 8 is -> "" ; index  9 is ""
> With a text with more chars instead of 4 with _new, it keeps eating into the 
> data at the following indices.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to