Re: [Discuss][Java] 64-bit lengths for ValueVectors

Jacques Nadeau Fri, 23 Aug 2019 04:56:19 -0700

On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield <[email protected]>
wrote:


> I don't think we should couple this discussion with the implementation of
>> large list, etc since I think those two concepts are independent.
>
> I'm still trying to balance in my mind which is a worse experience for
> consumers of the libraries for these types.  Claiming that Java supports
> these types but throwing an exception when the Vectors exceed 32-bits or
> just say they aren't supported until we have 64-bit support in Java.
>

The vector indexes being limited to 32 bits doesn't limit the addressing to
32 bit chunks of memory. For example, you're prime example before was image
data. Having 2 billion images of 1mb images would still be supported
without changing the index addressing.



>
>> I've asked some others on my team their opinions on the risk here. I
>> think we should probably review some our more complex vector interactions
>> and see how the jvm's assembly changes with this kind of change. Using
>> microbenchmarking is good but I think we also need to see whether we're
>> constantly inserting additional instructions or if in most cases, this
>> actually doesn't impact instruction count.
>
>
> Is this something that your team will take on?
>


Yeah, we need to look at this I think.

Do you need a rebased version of the PR or is the existing one sufficient?
>

Rebase would help if possible.



> Thanks,
> Micah
>
> On Thu, Aug 22, 2019 at 8:56 PM Jacques Nadeau <[email protected]> wrote:
>
>> I don't think we should couple this discussion with the implementation of
>> large list, etc since I think those two concepts are independent.
>>
>> I've asked some others on my team their opinions on the risk here. I
>> think we should probably review some our more complex vector interactions
>> and see how the jvm's assembly changes with this kind of change. Using
>> microbenchmarking is good but I think we also need to see whether we're
>> constantly inserting additional instructions or if in most cases, this
>> actually doesn't impact instruction count.
>>
>>
>>
>> On Wed, Aug 21, 2019 at 12:18 PM Micah Kornfield <[email protected]>
>> wrote:
>>
>>>
>>>> With regards to the reference implementation point. It is a good point.
>>>> I'm on vacation this week. Unless you're pushing hard on this, can we pick
>>>> this up and discuss more next week?
>>>
>>>
>>> Hi Jacques, I hope you had a good rest.  Any more thoughts on the
>>> reference implementation aspect of this?
>>>
>>>
>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>> would be best to decouple this discussion from the release timeline
>>>> given how many people we have relying on regular releases coming out.
>>>> We can keep continue making major 0.x releases until we're ready to
>>>> release 1.0.0.
>>>
>>>
>>> I'm OK with it as long as other stakeholders are. Timed releases are the
>>> way to go.  As stated on the release thread [1] we need a better mechanism
>>> to avoid this type of issue arising again.  The release thread also had
>>> some more discussion on compatibility.
>>>
>>> Thanks,
>>> Micah
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E
>>>
>>>
>>> On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <[email protected]>
>>> wrote:
>>>
>>>> On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield <[email protected]>
>>>> wrote:
>>>> >
>>>> > Hi Wes and Jacques,
>>>> > See responses below.
>>>> >
>>>> > With regards to the reference implementation point. It is a good
>>>> point. I'm
>>>> > > on vacation this week. Unless you're pushing hard on this, can we
>>>> pick this
>>>> > > up and discuss more next week?
>>>> >
>>>> >
>>>> > Sure thing, enjoy your vacation.  I think the only practical
>>>> implications
>>>> > are it delays choices around implementing LargeList, LargeBinary,
>>>> > LargeString in Java, which in turn might push out the 0.15.0 release.
>>>> >
>>>>
>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>> would be best to decouple this discussion from the release timeline
>>>> given how many people we have relying on regular releases coming out.
>>>> We can keep continue making major 0.x releases until we're ready to
>>>> release 1.0.0.
>>>>
>>>> > My stance on this is that I don't know how important it is for Java to
>>>> > > support vectors over INT32_MAX elements. The use cases enabled by
>>>> > > having very large arrays seem to be concentrated in the native code
>>>> > > world (e.g. C/C++/Rust) -- that could just be
>>>> implementation-centrism
>>>> > > on my part, though.
>>>> >
>>>> >
>>>> > A data point against this view is Spark has done work to eliminate 2GB
>>>> > memory limits on its block sizes [1].  I don't claim to understand the
>>>> > implications of this. Bryan might you have any thoughts here?  I'm OK
>>>> with
>>>> > INT32_MAX, as well, I think we should think about what this means for
>>>> > adding Large types to Java and implications for reference
>>>> implementations.
>>>> >
>>>> > Thanks,
>>>> > Micah
>>>> >
>>>> > [1] https://issues.apache.org/jira/browse/SPARK-6235
>>>> >
>>>> > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau <[email protected]>
>>>> wrote:
>>>> >
>>>> > > Hey Micah,
>>>> > >
>>>> > > Appreciate the offer on the compiling. The reality is I'm more
>>>> concerned
>>>> > > about the unknowns than the compiling issue itself. Any time you've
>>>> been
>>>> > > tuning for a while, changing something like this could be totally
>>>> fine or
>>>> > > cause a couple of major issues. For example, we've done a very
>>>> large amount
>>>> > > of work reducing heap memory footprint of the vectors. Are target
>>>> is to
>>>> > > actually get it down to 24 bytes per ArrowBuf and 24 bytes heap per
>>>> vector
>>>> > > (not including arrow bufs).
>>>> > >
>>>> > > With regards to the reference implementation point. It is a good
>>>> point.
>>>> > > I'm on vacation this week. Unless you're pushing hard on this, can
>>>> we pick
>>>> > > this up and discuss more next week?
>>>> > >
>>>> > > thanks,
>>>> > > Jacques
>>>> > >
>>>> > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield <
>>>> [email protected]>
>>>> > > wrote:
>>>> > >
>>>> > >> Hi Jacques,
>>>> > >> I definitely understand these concerns and this change is risky
>>>> because it
>>>> > >> is so large.  Perhaps, creating a new hierarchy, might be the
>>>> cleanest way
>>>> > >> of dealing with this.  This could have other benefits like
>>>> cleaning up
>>>> > >> some
>>>> > >> cruft around dictionary encode and "orphaned" method.   Per past
>>>> e-mail
>>>> > >> threads I agree it is beneficial to have 2 separate reference
>>>> > >> implementations that can communicate fully, and my intent here was
>>>> to
>>>> > >> close
>>>> > >> that gap.
>>>> > >>
>>>> > >> Trying to
>>>> > >> > determine the ramifications of these changes would be
>>>> challenging and
>>>> > >> time
>>>> > >> > consuming against all the different ways we interact with the
>>>> Arrow Java
>>>> > >> > library.
>>>> > >>
>>>> > >>
>>>> > >> Understood.  I took a quick look at Dremio-OSS it seems like it
>>>> has a
>>>> > >> simple java build system?  If it is helpful, I can try to get a
>>>> fork
>>>> > >> running that at least compiles against this PR.  My plan would be
>>>> to cast
>>>> > >> any place that was changed to return a long back to an int, so in
>>>> essence
>>>> > >> the Dremio algorithms would reman 32-bit implementations.
>>>> > >>
>>>> > >> I don't  have the infrastructure to test this change properly from
>>>> a
>>>> > >> distributed systems perspective, so it would still take some time
>>>> from
>>>> > >> Dremio to validate for regressions.
>>>> > >>
>>>> > >> I'm not saying I'm against this but want to make sure we've
>>>> > >> > explored all less disruptive options before considering changing
>>>> > >> something
>>>> > >> > this fundamental (especially when I generally hold the view that
>>>> large
>>>> > >> cell
>>>> > >> > counts against massive contiguous memory is an anti pattern to
>>>> scalable
>>>> > >> > analytical processing--purely subjective of course).
>>>> > >>
>>>> > >>
>>>> > >> I'm open to other ideas here, as well. I don't think it is out of
>>>> the
>>>> > >> question to leave the Java implementation as 32-bit, but if we do,
>>>> then I
>>>> > >> think we should consider a different strategy for reference
>>>> > >> implementations.
>>>> > >>
>>>> > >> Thanks,
>>>> > >> Micah
>>>> > >>
>>>> > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau <[email protected]
>>>> >
>>>> > >> wrote:
>>>> > >>
>>>> > >> > Hey Micah, I didn't have a particular path in mind. Was thinking
>>>> more
>>>> > >> along
>>>> > >> > the lines of extra methods as opposed to separate classes.
>>>> > >> >
>>>> > >> > Arrow hasn't historically been a place where we're writing
>>>> algorithms in
>>>> > >> > Java so the fact that they aren't there doesn't mean they don't
>>>> exist.
>>>> > >> We
>>>> > >> > have a large amount of code that depends on the current behavior
>>>> that is
>>>> > >> > deployed in hundreds of customer clusters (you can peruse our
>>>> dremio
>>>> > >> repo
>>>> > >> > to see how extensively we leverage Arrow if interested). Trying
>>>> to
>>>> > >> > determine the ramifications of these changes would be
>>>> challenging and
>>>> > >> time
>>>> > >> > consuming against all the different ways we interact with the
>>>> Arrow Java
>>>> > >> > library. I'm not saying I'm against this but want to make sure
>>>> we've
>>>> > >> > explored all less disruptive options before considering changing
>>>> > >> something
>>>> > >> > this fundamental (especially when I generally hold the view that
>>>> large
>>>> > >> cell
>>>> > >> > counts against massive contiguous memory is an anti pattern to
>>>> scalable
>>>> > >> > analytical processing--purely subjective of course).
>>>> > >> >
>>>> > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield <
>>>> [email protected]>
>>>> > >> > wrote:
>>>> > >> >
>>>> > >> > > Hi Jacques,
>>>> > >> > > What avenue were you thinking for supporting both paths?   I
>>>> didn't
>>>> > >> want
>>>> > >> > > to pursue a different class hierarchy, because I felt like
>>>> that would
>>>> > >> > > effectively fork the code base, but that is potentially an
>>>> option that
>>>> > >> > > would allow us to have a complete reference implementation in
>>>> Java
>>>> > >> that
>>>> > >> > can
>>>> > >> > > fully interact with C++, without major changes to this code.
>>>> > >> > >
>>>> > >> > > For supporting both APIs on the same classes/interfaces, I
>>>> think they
>>>> > >> > > roughly fall into three categories, changes to input
>>>> parameters,
>>>> > >> changes
>>>> > >> > to
>>>> > >> > > output parameters and algorithm changes.
>>>> > >> > >
>>>> > >> > > For inputs, changing from int to long is essentially a no-op
>>>> from the
>>>> > >> > > compiler perspective.  From the limited micro-benchmarking
>>>> this also
>>>> > >> > > doesn't seem to have a performance impact.  So we could keep
>>>> two
>>>> > >> versions
>>>> > >> > > of the methods that only differ on inputs, but it is not clear
>>>> what
>>>> > >> the
>>>> > >> > > value of that would be.
>>>> > >> > >
>>>> > >> > > For outputs, we can't support methods "long getLength()" and
>>>> "int
>>>> > >> > > getLength()" in the same class, so we would be forced into
>>>> something
>>>> > >> like
>>>> > >> > > "long getLength(boolean dummy)" which I think is a less
>>>> desirable.
>>>> > >> > >
>>>> > >> > > For algorithm changes, there did not appear to be too many
>>>> places
>>>> > >> where
>>>> > >> > we
>>>> > >> > > actually loop over all elements (it is quite possible I missed
>>>> > >> something
>>>> > >> > > here), the ones that I did find I was able to mitigate
>>>> performance
>>>> > >> > > penalties as noted above.  Some of the current implementation
>>>> will
>>>> > >> get a
>>>> > >> > > lot slower for "large arrays", but we can likely fix those
>>>> later or in
>>>> > >> > this
>>>> > >> > > PR with a nested while loop instead of 2 for loops.
>>>> > >> > >
>>>> > >> > > Thanks,
>>>> > >> > > Micah
>>>> > >> > >
>>>> > >> > >
>>>> > >> > > On Saturday, August 10, 2019, Jacques Nadeau <
>>>> [email protected]>
>>>> > >> wrote:
>>>> > >> > >
>>>> > >> > >> This is a pretty massive change to the apis. I wonder how
>>>> nasty it
>>>> > >> would
>>>> > >> > >> be to just support both paths. Have you evaluated how complex
>>>> that
>>>> > >> > would be?
>>>> > >> > >>
>>>> > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield <
>>>> > >> [email protected]>
>>>> > >> > >> wrote:
>>>> > >> > >>
>>>> > >> > >>> After more investigation, it looks like Float8Benchmarks at
>>>> least
>>>> > >> on my
>>>> > >> > >>> machine are within the range of noise.
>>>> > >> > >>>
>>>> > >> > >>> For BitVectorHelper I pushed a new commit [1], seems to
>>>> bring the
>>>> > >> > >>> BitVectorHelper benchmarks back inline (and even with some
>>>> > >> improvement
>>>> > >> > >>> for
>>>> > >> > >>> getNullCountBenchmark).
>>>> > >> > >>>
>>>> > >> > >>> Benchmark                                        Mode  Cnt
>>>>  Score
>>>> > >> > >>>  Error
>>>> > >> > >>>  Units
>>>> > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5
>>>>  3.821 ±
>>>> > >> > >>> 0.031
>>>> > >> > >>>  ns/op
>>>> > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5
>>>> 14.884 ±
>>>> > >> > >>> 0.141
>>>> > >> > >>>  ns/op
>>>> > >> > >>>
>>>> > >> > >>> I applied the same pattern to other loops that I could find,
>>>> and for
>>>> > >> > any
>>>> > >> > >>> "for (long" loop on the critical path, I broke it up into
>>>> two loops.
>>>> > >> > the
>>>> > >> > >>> first loop does iteration by integer, the second finishes
>>>> off for
>>>> > >> any
>>>> > >> > >>> long
>>>> > >> > >>> values.  As a side note it seems like optimization for loops
>>>> using
>>>> > >> long
>>>> > >> > >>> counters at least have a semi-recent open bug for the JVM [2]
>>>> > >> > >>>
>>>> > >> > >>> Thanks,
>>>> > >> > >>> Micah
>>>> > >> > >>>
>>>> > >> > >>> [1]
>>>> > >> > >>>
>>>> > >> > >>>
>>>> > >> >
>>>> > >>
>>>> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797
>>>> > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051
>>>> > >> > >>>
>>>> > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <
>>>> > >> [email protected]>
>>>> > >> > >>> wrote:
>>>> > >> > >>>
>>>> > >> > >>> > Indeed, the BoundChecking and CheckNullForGet variables
>>>> can make a
>>>> > >> > big
>>>> > >> > >>> > difference.  I didn't initially run the benchmarks with
>>>> these
>>>> > >> turned
>>>> > >> > on
>>>> > >> > >>> > (you can see the result from above with
>>>> Float8Benchmarks).  Here
>>>> > >> are
>>>> > >> > >>> new
>>>> > >> > >>> > numbers including with the flags enabled.  It looks like
>>>> using
>>>> > >> longs
>>>> > >> > >>> might
>>>> > >> > >>> > be a little bit slower, I'll see what I can do to mitigate
>>>> this.
>>>> > >> > >>> >
>>>> > >> > >>> > Ravindra also volunteered to try to benchmark the changes
>>>> with
>>>> > >> > Dremio's
>>>> > >> > >>> > code on today's sync call.
>>>> > >> > >>> >
>>>> > >> > >>> > New
>>>> > >> > >>> >
>>>> > >> > >>> > Benchmark                                        Mode
>>>> Cnt   Score
>>>> > >> > >>>  Error
>>>> > >> > >>> > Units
>>>> > >> > >>> >
>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5
>>>> > >>  4.176 ±
>>>> > >> > >>> 1.292
>>>> > >> > >>> > ns/op
>>>> > >> > >>> >
>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5
>>>> > >> 26.102 ±
>>>> > >> > >>> 0.700
>>>> > >> > >>> > ns/op
>>>> > >> > >>> >
>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  7.398 ±
>>>> 0.084
>>>> > >> us/op
>>>> > >> > >>> >
>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.711 ±
>>>> 0.057
>>>> > >> us/op
>>>> > >> > >>> >
>>>> > >> > >>> >
>>>> > >> > >>> >
>>>> > >> > >>> > old
>>>> > >> > >>> >
>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5
>>>> > >>  3.828 ±
>>>> > >> > >>> 0.030
>>>> > >> > >>> > ns/op
>>>> > >> > >>> >
>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5
>>>> > >> 20.611 ±
>>>> > >> > >>> 0.188
>>>> > >> > >>> > ns/op
>>>> > >> > >>> >
>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  6.597 ±
>>>> 0.462
>>>> > >> us/op
>>>> > >> > >>> >
>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.615 ±
>>>> 0.027
>>>> > >> us/op
>>>> > >> > >>> >
>>>> > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <
>>>> [email protected]>
>>>> > >> > wrote:
>>>> > >> > >>> >
>>>> > >> > >>> >> Hi Gonzalo,
>>>> > >> > >>> >>
>>>> > >> > >>> >> Thanks for sharing the performance results.
>>>> > >> > >>> >> I am wondering if you have turned off the flag
>>>> > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED.
>>>> > >> > >>> >> If not, the lower throughput should be expected.
>>>> > >> > >>> >>
>>>> > >> > >>> >> Best,
>>>> > >> > >>> >> Liya Fan
>>>> > >> > >>> >>
>>>> > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <
>>>> > >> > >>> [email protected]>
>>>> > >> > >>> >> wrote:
>>>> > >> > >>> >>
>>>> > >> > >>> >>> Hi Gonzalo,
>>>> > >> > >>> >>> Thank you for the feedback.  I wasn't aware of the JIT
>>>> > >> > >>> implications.   At
>>>> > >> > >>> >>> least on the benchmark run they don't seem to have an
>>>> impact.
>>>> > >> > >>> >>>
>>>> > >> > >>> >>> If there are other benchmarks that people have that can
>>>> > >> validate if
>>>> > >> > >>> this
>>>> > >> > >>> >>> change will be problematic I would appreciate trying to
>>>> run them
>>>> > >> > >>> with the
>>>> > >> > >>> >>> PR.  I will try to run the ones for zeroing/popcnt
>>>> tonight to
>>>> > >> see
>>>> > >> > if
>>>> > >> > >>> >>> there
>>>> > >> > >>> >>> is a change in those.
>>>> > >> > >>> >>>
>>>> > >> > >>> >>> -Micah
>>>> > >> > >>> >>>
>>>> > >> > >>> >>>
>>>> > >> > >>> >>>
>>>> > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz Jaureguizar <
>>>> > >> > >>> >>> [email protected]> wrote:
>>>> > >> > >>> >>>
>>>> > >> > >>> >>> > I would recommend to take care with this kind of
>>>> changes.
>>>> > >> > >>> >>> >
>>>> > >> > >>> >>> > I didn't try Arrow in more than one year, but by then
>>>> the
>>>> > >> > >>> performance
>>>> > >> > >>> >>> was
>>>> > >> > >>> >>> > quite bad in comparison with plain byte buffer access
>>>> > >> > >>> >>> > (see
>>>> http://git.net/apache-arrow-development/msg02353.html *)
>>>> > >> > and
>>>> > >> > >>> >>> > there are several optimizations that the JVM
>>>> (specifically,
>>>> > >> C2)
>>>> > >> > >>> does
>>>> > >> > >>> >>> not
>>>> > >> > >>> >>> > apply when dealing with int instead of longs. One of
>>>> the
>>>> > >> > >>> >>> > most commons is the loop unrolling and vectorization.
>>>> > >> > >>> >>> >
>>>> > >> > >>> >>> > * It doesn't seem the best way to reference an old
>>>> email on
>>>> > >> the
>>>> > >> > >>> list,
>>>> > >> > >>> >>> but
>>>> > >> > >>> >>> > it is the only result shown by Google
>>>> > >> > >>> >>> >
>>>> > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<
>>>> > >> > [email protected]
>>>> > >> > >>> >)
>>>> > >> > >>> >>> > escribió:
>>>> > >> > >>> >>> >
>>>> > >> > >>> >>> >> Hi Micah,
>>>> > >> > >>> >>> >>
>>>> > >> > >>> >>> >> Thanks for your effort. The performance result looks
>>>> good.
>>>> > >> > >>> >>> >>
>>>> > >> > >>> >>> >> As you indicated, ArrowBuf will take additional 12
>>>> bytes (4
>>>> > >> > bytes
>>>> > >> > >>> for
>>>> > >> > >>> >>> each
>>>> > >> > >>> >>> >> of length, write index, and read index).
>>>> > >> > >>> >>> >> Similar overheads also exist for vectors like
>>>> > >> > >>> BaseFixedWidthVector,
>>>> > >> > >>> >>> >> BaseVariableWidthVector, etc.
>>>> > >> > >>> >>> >>
>>>> > >> > >>> >>> >> IMO, such overheads are small enough to justify the
>>>> change.
>>>> > >> > >>> >>> >> Let's check if there are other overheads.
>>>> > >> > >>> >>> >>
>>>> > >> > >>> >>> >> Best,
>>>> > >> > >>> >>> >> Liya Fan
>>>> > >> > >>> >>> >>
>>>> > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield <
>>>> > >> > >>> [email protected]
>>>> > >> > >>> >>> >
>>>> > >> > >>> >>> >> wrote:
>>>> > >> > >>> >>> >>
>>>> > >> > >>> >>> >> > Hi Liya Fan,
>>>> > >> > >>> >>> >> > Based on the Float8Benchmark there does not seem to
>>>> be any
>>>> > >> > >>> >>> meaningful
>>>> > >> > >>> >>> >> > performance difference on my machine.  At least for
>>>> me, the
>>>> > >> > >>> >>> benchmarks
>>>> > >> > >>> >>> >> are
>>>> > >> > >>> >>> >> > not stable enough to say one is faster than the
>>>> other (I've
>>>> > >> > >>> pasted
>>>> > >> > >>> >>> >> results
>>>> > >> > >>> >>> >> > below).  That being said my machine isn't
>>>> necessarily the
>>>> > >> most
>>>> > >> > >>> >>> reliable
>>>> > >> > >>> >>> >> for
>>>> > >> > >>> >>> >> > benchmarking.
>>>> > >> > >>> >>> >> >
>>>> > >> > >>> >>> >> > On an intuitive level, this makes sense to me,  for
>>>> the
>>>> > >> most
>>>> > >> > >>> part it
>>>> > >> > >>> >>> >> seems
>>>> > >> > >>> >>> >> > like the change just moves casting from "int" to
>>>> "long"
>>>> > >> > further
>>>> > >> > >>> up
>>>> > >> > >>> >>> the
>>>> > >> > >>> >>> >> > stack  for  "PlatformDepdendent" operations.  If
>>>> there are
>>>> > >> > other
>>>> > >> > >>> >>> >> benchmarks
>>>> > >> > >>> >>> >> > that you think are worth running let me know.
>>>> > >> > >>> >>> >> >
>>>> > >> > >>> >>> >> > One downside performance wise I think for his
>>>> change is it
>>>> > >> > >>> >>> increases the
>>>> > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose could
>>>> influence
>>>> > >> > cache
>>>> > >> > >>> >>> misses
>>>> > >> > >>> >>> >> at
>>>> > >> > >>> >>> >> > some level or increase the size of call-stacks, but
>>>> this
>>>> > >> > doesn't
>>>> > >> > >>> >>> seem to
>>>> > >> > >>> >>> >> > show up in the benchmark..
>>>> > >> > >>> >>> >> >
>>>> > >> > >>> >>> >> > Thanks,
>>>> > >> > >>> >>> >> > Micah
>>>> > >> > >>> >>> >> >
>>>> > >> > >>> >>> >> > Sample benchmark numbers:
>>>> > >> > >>> >>> >> >
>>>> > >> > >>> >>> >> > [New Code]
>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt
>>>>  Score
>>>> > >>  Error
>>>> > >> > >>> >>> Units
>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5
>>>> 15.441 ±
>>>> > >> 0.469
>>>> > >> > >>> >>> us/op
>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5
>>>> 14.057 ±
>>>> > >> 0.115
>>>> > >> > >>> >>> us/op
>>>> > >> > >>> >>> >> >
>>>> > >> > >>> >>> >> >
>>>> > >> > >>> >>> >> > [Old code]
>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt
>>>>  Score
>>>> > >>  Error
>>>> > >> > >>> >>> Units
>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5
>>>> 16.248 ±
>>>> > >> 1.409
>>>> > >> > >>> >>> us/op
>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5
>>>> 14.150 ±
>>>> > >> 0.084
>>>> > >> > >>> >>> us/op
>>>> > >> > >>> >>> >> >
>>>> > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <
>>>> > >> [email protected]
>>>> > >> > >
>>>> > >> > >>> >>> wrote:
>>>> > >> > >>> >>> >> >
>>>> > >> > >>> >>> >> >> Hi Micah,
>>>> > >> > >>> >>> >> >>
>>>> > >> > >>> >>> >> >> Thanks a lot for doing this.
>>>> > >> > >>> >>> >> >>
>>>> > >> > >>> >>> >> >> I am a little concerned about if there is any
>>>> negative
>>>> > >> > >>> performance
>>>> > >> > >>> >>> >> impact
>>>> > >> > >>> >>> >> >> on the current 32-bit-length based applications.
>>>> > >> > >>> >>> >> >> Can we do some performance comparison on our
>>>> existing
>>>> > >> > >>> benchmarks?
>>>> > >> > >>> >>> >> >>
>>>> > >> > >>> >>> >> >> Best,
>>>> > >> > >>> >>> >> >> Liya Fan
>>>> > >> > >>> >>> >> >>
>>>> > >> > >>> >>> >> >>
>>>> > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield <
>>>> > >> > >>> >>> [email protected]>
>>>> > >> > >>> >>> >> >> wrote:
>>>> > >> > >>> >>> >> >>
>>>> > >> > >>> >>> >> >>> There have been some previous discussions on the
>>>> mailing
>>>> > >> > about
>>>> > >> > >>> >>> >> supporting
>>>> > >> > >>> >>> >> >>> 64-bit lengths for  Java ValueVectors (this is
>>>> what the
>>>> > >> IPC
>>>> > >> > >>> >>> >> specification
>>>> > >> > >>> >>> >> >>> and C++ support).  I created a PR [1] that
>>>> changes all
>>>> > >> APIs
>>>> > >> > >>> that I
>>>> > >> > >>> >>> >> could
>>>> > >> > >>> >>> >> >>> find that take an index to take an "long" instead
>>>> of an
>>>> > >> > "int"
>>>> > >> > >>> (and
>>>> > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs).
>>>> > >> > >>> >>> >> >>>
>>>> > >> > >>> >>> >> >>> It is a big change, so I think it is worth
>>>> discussing if
>>>> > >> it
>>>> > >> > is
>>>> > >> > >>> >>> >> something
>>>> > >> > >>> >>> >> >>> we
>>>> > >> > >>> >>> >> >>> still want to move forward with.  It would be
>>>> nice to
>>>> > >> come
>>>> > >> > to
>>>> > >> > >>> a
>>>> > >> > >>> >>> >> >>> conclusion
>>>> > >> > >>> >>> >> >>> quickly, ideally in the next few days, to avoid a
>>>> lot of
>>>> > >> > merge
>>>> > >> > >>> >>> >> conflicts.
>>>> > >> > >>> >>> >> >>>
>>>> > >> > >>> >>> >> >>> The reason I did this work now is the C++
>>>> implementation
>>>> > >> has
>>>> > >> > >>> added
>>>> > >> > >>> >>> >> >>> support
>>>> > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString arrays
>>>> and
>>>> > >> based
>>>> > >> > on
>>>> > >> > >>> >>> prior
>>>> > >> > >>> >>> >> >>> discussions we need to have similar support in
>>>> Java
>>>> > >> before
>>>> > >> > our
>>>> > >> > >>> >>> next
>>>> > >> > >>> >>> >> >>> release. Support 64-bit indexes means we can have
>>>> full
>>>> > >> > >>> >>> compatibility
>>>> > >> > >>> >>> >> and
>>>> > >> > >>> >>> >> >>> make the most use of the types in Java.
>>>> > >> > >>> >>> >> >>>
>>>> > >> > >>> >>> >> >>> Look forward to hearing feedback.
>>>> > >> > >>> >>> >> >>>
>>>> > >> > >>> >>> >> >>> Thanks,
>>>> > >> > >>> >>> >> >>> Micah
>>>> > >> > >>> >>> >> >>>
>>>> > >> > >>> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020
>>>> > >> > >>> >>> >> >>>
>>>> > >> > >>> >>> >> >>
>>>> > >> > >>> >>> >>
>>>> > >> > >>> >>> >
>>>> > >> > >>> >>>
>>>> > >> > >>> >>
>>>> > >> > >>>
>>>> > >> > >>
>>>> > >> >
>>>> > >>
>>>> > >
>>>>
>>>

Re: [Discuss][Java] 64-bit lengths for ValueVectors

Reply via email to