Liya Fan created ARROW-6394:
-------------------------------

             Summary: [Java] Support conversions between delta vector and 
partial sum vector
                 Key: ARROW-6394
                 URL: https://issues.apache.org/jira/browse/ARROW-6394
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Java
            Reporter: Liya Fan
            Assignee: Liya Fan


What is a delta vector/partial sum vector?

Given an integer vector a with length n, its partial sum vector is another 
integer vector b with length n + 1, with values defined as:

b(0) = initial sum
b(i) = a(0) + a(1) + ... + a(i - 1) i = 1, 2, ..., n

Given an integer vector with length n + 1, its delta vector is another integer 
vector b with length n, with values defined as:

b(i) = a(i) - a(i - 1), i = 0, 1, ... , n -1

In this issue, we provide utilities to convert between vector and partial sum 
vector. It is interesting to note that the two operations corresponding to the 
discrete integration and differentian.

These conversions have wide applications. For example,

1. The run-length vector proposed by Micah is based on the partial sum vector, 
while the deduplication functionality is based on delta vector. This issue 
provides conversions between them.

2. The current VarCharVector/VarBinaryVector implementations are based on 
partial sum vector. We can transform them to delta vectors before IPC, to 
reduce network traffic.

3. Converting to delta can be considered as a way for data compression. To 
further reduce the data volume, the operation can be applied more than once, to 
further reduce data volume.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to