[Java] Constructing a list from ArrowBufs

Andrew Melo Thu, 27 Apr 2023 12:21:09 -0700

Hi all,

I am working on a Spark datasource plugin that reads a (custom) file
format and outputs arrow-backed columns. I'm having difficulty
figuring out how to construct a ListVector if I have an ArrowBuf with
the contents and know the width of each list. I've tried constructing
the buffer with the code I pasted below, but it appears something
becomes unaligned, and I get incorrect values back when reading the
vector back.


The documentation and elsewhere on the internet has examples where you
construct the ListVector element-by-element (e.g. with
UnionListWriter), but I'm having difficulty finding an example where
you start from ArrowBufs and use that to directly construct the
ListVector.

Does anyone have an example they could point me to?

Thanks!
Andrew

        // Number of bytes each element takes up Int is 4, etc..
        int itemSize = new AsDtype(dtype).memory_itemsize();
        int countBufferSize = (entryStop - entryStart + 1) * INT_SIZE;

        ArrowBuf countsBuf = allocator.buffer(countBufferSize);
        // File format uses BE, so perform a byte swap to get to LE
        ArrowBuf contentBuf = swapEndianness(contentTemp);

        ArrowType outerType = new ArrowType.List();
        // Convert from our internal dtype to the Arrow equivalent
        ArrowType innerType = dtypeToArrow();

        FieldType outerField = new FieldType(false, outerType, null);
        FieldType innerField = new FieldType(false, innerType, null);

        int outerLen = (entryStop - entryStart) * contentTemp.multiplicity();
        int innerLen = contentTemp.numitems();
        ArrowFieldNode outerNode = new ArrowFieldNode(outerLen, 0);
        ArrowFieldNode innerNode = new ArrowFieldNode(innerLen, 0);

        ListVector arrowVec = ListVector.empty("testcol", allocator);
        arrowVec.loadFieldBuffers(outerNode, Arrays.asList(null, countsBuf));

        AddOrGetResult<ValueVector> children =
arrowVec.addOrGetVector(innerField);

        FieldVector innerVec = (FieldVector) children.getVector();
        innerVec.loadFieldBuffers(innerNode, Arrays.asList(null, contentBuf));

[Java] Constructing a list from ArrowBufs

Reply via email to