Re: [Discuss][Java] 64-bit lengths for ValueVectors

Micah Kornfield Fri, 22 Nov 2019 11:46:03 -0800

I'll also note this isn't quite in final form, I'd still like to add some
more unit tests.


On Fri, Nov 22, 2019 at 11:36 AM Wes McKinney <wesmck...@gmail.com> wrote:

> hi Micah -- it makes sense to limit the scope for the time being to
> permitting LargeString/Binary work to proceed. Jacques, have you had a
> chance to look at this?
>
> On Fri, Nov 15, 2019 at 3:07 AM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> >
> > Apologies for the long delay, I chose to do the minimal work of limiting
> this change [1] to allowing ArrowBuf to 64-bit lengths.  This would unblock
> work on LargeString and LargeBinary.  If this change looks OK, I think
> there is some follow-up work to add more thorough unit/integration tests.
> >
> > As an aside, it does seem like the 2GB limit is affecting some users in
> Spark [2][3], so hopefully LargeString would help with this.
> >
> > Allowing more than MAX_INT elements is Vectors/array still a blocker for
> making LargeList useful.
> >
> > Thanks,
> > Micah
> >
> > [1]  https://github.com/apache/arrow/pull/5020
> > [2]
> https://stackoverflow.com/questions/58739888/spark-is-it-possible-to-increase-pyarrow-buffer#comment103812119_58739888
> > [3] https://issues.apache.org/jira/browse/ARROW-4890
> >
> > On Fri, Aug 23, 2019 at 8:33 AM Jacques Nadeau <jacq...@apache.org>
> wrote:
> >>
> >>
> >>
> >> On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> >>>>
> >>>> The vector indexes being limited to 32 bits doesn't limit the
> addressing to 32 bit chunks of memory. For example, you're prime example
> before was image data. Having 2 billion images of 1mb images would still be
> supported without changing the index addressing.
> >>>
> >>> This might be pre-coffee math, but I think we are limited to
> approximately 2000 images because an ArrowBuf only holds up-to 2 billion
> bytes [1].  While we have plenty of room for the offsets, we quickly run
> out of contiguous data storage space. For LargeString and LargeBinary this
> could be fixed by changing ArrowBuf.
> >>>
> >>> LargeArray faces the same problem only it applies to its child
> vectors.  Supporting LargeArray properly is really what drove the
> large-scale interface change.
> >>
> >>
> >> My expressed concern about these changes was specifically about the use
> of long for get/set in the vector interfaces. I'm not saying that we
> constrain memory/ArrowBufs to 32bits.
> >>
> >>>
> >>>> Rebase would help if possible.
> >>>
> >>> I'll try to get to this in the next few days.
> >>>
> >>> [1]
> https://github.com/apache/arrow/blob/95175fe7cb8439eebe6d2f6e0495f551d6864380/java/memory/src/main/java/io/netty/buffer/ArrowBuf.java#L164
> >>>
> >>> On Fri, Aug 23, 2019 at 4:55 AM Jacques Nadeau <jacq...@apache.org>
> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> I don't think we should couple this discussion with the
> implementation of large list, etc since I think those two concepts are
> independent.
> >>>>>
> >>>>> I'm still trying to balance in my mind which is a worse experience
> for consumers of the libraries for these types.  Claiming that Java
> supports these types but throwing an exception when the Vectors exceed
> 32-bits or just say they aren't supported until we have 64-bit support in
> Java.
> >>>>
> >>>>
> >>>> The vector indexes being limited to 32 bits doesn't limit the
> addressing to 32 bit chunks of memory. For example, you're prime example
> before was image data. Having 2 billion images of 1mb images would still be
> supported without changing the index addressing.
> >>>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>> I've asked some others on my team their opinions on the risk here.
> I think we should probably review some our more complex vector interactions
> and see how the jvm's assembly changes with this kind of change. Using
> microbenchmarking is good but I think we also need to see whether we're
> constantly inserting additional instructions or if in most cases, this
> actually doesn't impact instruction count.
> >>>>>
> >>>>>
> >>>>> Is this something that your team will take on?
> >>>>
> >>>>
> >>>>
> >>>> Yeah, we need to look at this I think.
> >>>>
> >>>>> Do you need a rebased version of the PR or is the existing one
> sufficient?
> >>>>
> >>>>
> >>>> Rebase would help if possible.
> >>>>
> >>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Micah
> >>>>>
> >>>>> On Thu, Aug 22, 2019 at 8:56 PM Jacques Nadeau <jacq...@apache.org>
> wrote:
> >>>>>>
> >>>>>> I don't think we should couple this discussion with the
> implementation of large list, etc since I think those two concepts are
> independent.
> >>>>>>
> >>>>>> I've asked some others on my team their opinions on the risk here.
> I think we should probably review some our more complex vector interactions
> and see how the jvm's assembly changes with this kind of change. Using
> microbenchmarking is good but I think we also need to see whether we're
> constantly inserting additional instructions or if in most cases, this
> actually doesn't impact instruction count.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Aug 21, 2019 at 12:18 PM Micah Kornfield <
> emkornfi...@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> With regards to the reference implementation point. It is a good
> point. I'm on vacation this week. Unless you're pushing hard on this, can
> we pick this up and discuss more next week?
> >>>>>>>
> >>>>>>>
> >>>>>>> Hi Jacques, I hope you had a good rest.  Any more thoughts on the
> reference implementation aspect of this?
> >>>>>>>
> >>>>>>>>
> >>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
> >>>>>>>> would be best to decouple this discussion from the release
> timeline
> >>>>>>>> given how many people we have relying on regular releases coming
> out.
> >>>>>>>> We can keep continue making major 0.x releases until we're ready
> to
> >>>>>>>> release 1.0.0.
> >>>>>>>
> >>>>>>>
> >>>>>>> I'm OK with it as long as other stakeholders are. Timed releases
> are the way to go.  As stated on the release thread [1] we need a better
> mechanism to avoid this type of issue arising again.  The release thread
> also had some more discussion on compatibility.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Micah
> >>>>>>>
> >>>>>>> [1]
> https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <wesmck...@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>> On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield <
> emkornfi...@gmail.com> wrote:
> >>>>>>>> >
> >>>>>>>> > Hi Wes and Jacques,
> >>>>>>>> > See responses below.
> >>>>>>>> >
> >>>>>>>> > With regards to the reference implementation point. It is a
> good point. I'm
> >>>>>>>> > > on vacation this week. Unless you're pushing hard on this,
> can we pick this
> >>>>>>>> > > up and discuss more next week?
> >>>>>>>> >
> >>>>>>>> >
> >>>>>>>> > Sure thing, enjoy your vacation.  I think the only practical
> implications
> >>>>>>>> > are it delays choices around implementing LargeList,
> LargeBinary,
> >>>>>>>> > LargeString in Java, which in turn might push out the 0.15.0
> release.
> >>>>>>>> >
> >>>>>>>>
> >>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
> >>>>>>>> would be best to decouple this discussion from the release
> timeline
> >>>>>>>> given how many people we have relying on regular releases coming
> out.
> >>>>>>>> We can keep continue making major 0.x releases until we're ready
> to
> >>>>>>>> release 1.0.0.
> >>>>>>>>
> >>>>>>>> > My stance on this is that I don't know how important it is for
> Java to
> >>>>>>>> > > support vectors over INT32_MAX elements. The use cases
> enabled by
> >>>>>>>> > > having very large arrays seem to be concentrated in the
> native code
> >>>>>>>> > > world (e.g. C/C++/Rust) -- that could just be
> implementation-centrism
> >>>>>>>> > > on my part, though.
> >>>>>>>> >
> >>>>>>>> >
> >>>>>>>> > A data point against this view is Spark has done work to
> eliminate 2GB
> >>>>>>>> > memory limits on its block sizes [1].  I don't claim to
> understand the
> >>>>>>>> > implications of this. Bryan might you have any thoughts here?
> I'm OK with
> >>>>>>>> > INT32_MAX, as well, I think we should think about what this
> means for
> >>>>>>>> > adding Large types to Java and implications for reference
> implementations.
> >>>>>>>> >
> >>>>>>>> > Thanks,
> >>>>>>>> > Micah
> >>>>>>>> >
> >>>>>>>> > [1] https://issues.apache.org/jira/browse/SPARK-6235
> >>>>>>>> >
> >>>>>>>> > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau <
> jacq...@apache.org> wrote:
> >>>>>>>> >
> >>>>>>>> > > Hey Micah,
> >>>>>>>> > >
> >>>>>>>> > > Appreciate the offer on the compiling. The reality is I'm
> more concerned
> >>>>>>>> > > about the unknowns than the compiling issue itself. Any time
> you've been
> >>>>>>>> > > tuning for a while, changing something like this could be
> totally fine or
> >>>>>>>> > > cause a couple of major issues. For example, we've done a
> very large amount
> >>>>>>>> > > of work reducing heap memory footprint of the vectors. Are
> target is to
> >>>>>>>> > > actually get it down to 24 bytes per ArrowBuf and 24 bytes
> heap per vector
> >>>>>>>> > > (not including arrow bufs).
> >>>>>>>> > >
> >>>>>>>> > > With regards to the reference implementation point. It is a
> good point.
> >>>>>>>> > > I'm on vacation this week. Unless you're pushing hard on
> this, can we pick
> >>>>>>>> > > this up and discuss more next week?
> >>>>>>>> > >
> >>>>>>>> > > thanks,
> >>>>>>>> > > Jacques
> >>>>>>>> > >
> >>>>>>>> > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield <
> emkornfi...@gmail.com>
> >>>>>>>> > > wrote:
> >>>>>>>> > >
> >>>>>>>> > >> Hi Jacques,
> >>>>>>>> > >> I definitely understand these concerns and this change is
> risky because it
> >>>>>>>> > >> is so large.  Perhaps, creating a new hierarchy, might be
> the cleanest way
> >>>>>>>> > >> of dealing with this.  This could have other benefits like
> cleaning up
> >>>>>>>> > >> some
> >>>>>>>> > >> cruft around dictionary encode and "orphaned" method.   Per
> past e-mail
> >>>>>>>> > >> threads I agree it is beneficial to have 2 separate reference
> >>>>>>>> > >> implementations that can communicate fully, and my intent
> here was to
> >>>>>>>> > >> close
> >>>>>>>> > >> that gap.
> >>>>>>>> > >>
> >>>>>>>> > >> Trying to
> >>>>>>>> > >> > determine the ramifications of these changes would be
> challenging and
> >>>>>>>> > >> time
> >>>>>>>> > >> > consuming against all the different ways we interact with
> the Arrow Java
> >>>>>>>> > >> > library.
> >>>>>>>> > >>
> >>>>>>>> > >>
> >>>>>>>> > >> Understood.  I took a quick look at Dremio-OSS it seems like
> it has a
> >>>>>>>> > >> simple java build system?  If it is helpful, I can try to
> get a fork
> >>>>>>>> > >> running that at least compiles against this PR.  My plan
> would be to cast
> >>>>>>>> > >> any place that was changed to return a long back to an int,
> so in essence
> >>>>>>>> > >> the Dremio algorithms would reman 32-bit implementations.
> >>>>>>>> > >>
> >>>>>>>> > >> I don't  have the infrastructure to test this change
> properly from a
> >>>>>>>> > >> distributed systems perspective, so it would still take some
> time from
> >>>>>>>> > >> Dremio to validate for regressions.
> >>>>>>>> > >>
> >>>>>>>> > >> I'm not saying I'm against this but want to make sure we've
> >>>>>>>> > >> > explored all less disruptive options before considering
> changing
> >>>>>>>> > >> something
> >>>>>>>> > >> > this fundamental (especially when I generally hold the
> view that large
> >>>>>>>> > >> cell
> >>>>>>>> > >> > counts against massive contiguous memory is an anti
> pattern to scalable
> >>>>>>>> > >> > analytical processing--purely subjective of course).
> >>>>>>>> > >>
> >>>>>>>> > >>
> >>>>>>>> > >> I'm open to other ideas here, as well. I don't think it is
> out of the
> >>>>>>>> > >> question to leave the Java implementation as 32-bit, but if
> we do, then I
> >>>>>>>> > >> think we should consider a different strategy for reference
> >>>>>>>> > >> implementations.
> >>>>>>>> > >>
> >>>>>>>> > >> Thanks,
> >>>>>>>> > >> Micah
> >>>>>>>> > >>
> >>>>>>>> > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau <
> jacq...@apache.org>
> >>>>>>>> > >> wrote:
> >>>>>>>> > >>
> >>>>>>>> > >> > Hey Micah, I didn't have a particular path in mind. Was
> thinking more
> >>>>>>>> > >> along
> >>>>>>>> > >> > the lines of extra methods as opposed to separate classes.
> >>>>>>>> > >> >
> >>>>>>>> > >> > Arrow hasn't historically been a place where we're writing
> algorithms in
> >>>>>>>> > >> > Java so the fact that they aren't there doesn't mean they
> don't exist.
> >>>>>>>> > >> We
> >>>>>>>> > >> > have a large amount of code that depends on the current
> behavior that is
> >>>>>>>> > >> > deployed in hundreds of customer clusters (you can peruse
> our dremio
> >>>>>>>> > >> repo
> >>>>>>>> > >> > to see how extensively we leverage Arrow if interested).
> Trying to
> >>>>>>>> > >> > determine the ramifications of these changes would be
> challenging and
> >>>>>>>> > >> time
> >>>>>>>> > >> > consuming against all the different ways we interact with
> the Arrow Java
> >>>>>>>> > >> > library. I'm not saying I'm against this but want to make
> sure we've
> >>>>>>>> > >> > explored all less disruptive options before considering
> changing
> >>>>>>>> > >> something
> >>>>>>>> > >> > this fundamental (especially when I generally hold the
> view that large
> >>>>>>>> > >> cell
> >>>>>>>> > >> > counts against massive contiguous memory is an anti
> pattern to scalable
> >>>>>>>> > >> > analytical processing--purely subjective of course).
> >>>>>>>> > >> >
> >>>>>>>> > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield <
> emkornfi...@gmail.com>
> >>>>>>>> > >> > wrote:
> >>>>>>>> > >> >
> >>>>>>>> > >> > > Hi Jacques,
> >>>>>>>> > >> > > What avenue were you thinking for supporting both
> paths?   I didn't
> >>>>>>>> > >> want
> >>>>>>>> > >> > > to pursue a different class hierarchy, because I felt
> like that would
> >>>>>>>> > >> > > effectively fork the code base, but that is potentially
> an option that
> >>>>>>>> > >> > > would allow us to have a complete reference
> implementation in Java
> >>>>>>>> > >> that
> >>>>>>>> > >> > can
> >>>>>>>> > >> > > fully interact with C++, without major changes to this
> code.
> >>>>>>>> > >> > >
> >>>>>>>> > >> > > For supporting both APIs on the same classes/interfaces,
> I think they
> >>>>>>>> > >> > > roughly fall into three categories, changes to input
> parameters,
> >>>>>>>> > >> changes
> >>>>>>>> > >> > to
> >>>>>>>> > >> > > output parameters and algorithm changes.
> >>>>>>>> > >> > >
> >>>>>>>> > >> > > For inputs, changing from int to long is essentially a
> no-op from the
> >>>>>>>> > >> > > compiler perspective.  From the limited
> micro-benchmarking this also
> >>>>>>>> > >> > > doesn't seem to have a performance impact.  So we could
> keep two
> >>>>>>>> > >> versions
> >>>>>>>> > >> > > of the methods that only differ on inputs, but it is not
> clear what
> >>>>>>>> > >> the
> >>>>>>>> > >> > > value of that would be.
> >>>>>>>> > >> > >
> >>>>>>>> > >> > > For outputs, we can't support methods "long getLength()"
> and "int
> >>>>>>>> > >> > > getLength()" in the same class, so we would be forced
> into something
> >>>>>>>> > >> like
> >>>>>>>> > >> > > "long getLength(boolean dummy)" which I think is a less
> desirable.
> >>>>>>>> > >> > >
> >>>>>>>> > >> > > For algorithm changes, there did not appear to be too
> many places
> >>>>>>>> > >> where
> >>>>>>>> > >> > we
> >>>>>>>> > >> > > actually loop over all elements (it is quite possible I
> missed
> >>>>>>>> > >> something
> >>>>>>>> > >> > > here), the ones that I did find I was able to mitigate
> performance
> >>>>>>>> > >> > > penalties as noted above.  Some of the current
> implementation will
> >>>>>>>> > >> get a
> >>>>>>>> > >> > > lot slower for "large arrays", but we can likely fix
> those later or in
> >>>>>>>> > >> > this
> >>>>>>>> > >> > > PR with a nested while loop instead of 2 for loops.
> >>>>>>>> > >> > >
> >>>>>>>> > >> > > Thanks,
> >>>>>>>> > >> > > Micah
> >>>>>>>> > >> > >
> >>>>>>>> > >> > >
> >>>>>>>> > >> > > On Saturday, August 10, 2019, Jacques Nadeau <
> jacq...@apache.org>
> >>>>>>>> > >> wrote:
> >>>>>>>> > >> > >
> >>>>>>>> > >> > >> This is a pretty massive change to the apis. I wonder
> how nasty it
> >>>>>>>> > >> would
> >>>>>>>> > >> > >> be to just support both paths. Have you evaluated how
> complex that
> >>>>>>>> > >> > would be?
> >>>>>>>> > >> > >>
> >>>>>>>> > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield <
> >>>>>>>> > >> emkornfi...@gmail.com>
> >>>>>>>> > >> > >> wrote:
> >>>>>>>> > >> > >>
> >>>>>>>> > >> > >>> After more investigation, it looks like
> Float8Benchmarks at least
> >>>>>>>> > >> on my
> >>>>>>>> > >> > >>> machine are within the range of noise.
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>> For BitVectorHelper I pushed a new commit [1], seems
> to bring the
> >>>>>>>> > >> > >>> BitVectorHelper benchmarks back inline (and even with
> some
> >>>>>>>> > >> improvement
> >>>>>>>> > >> > >>> for
> >>>>>>>> > >> > >>> getNullCountBenchmark).
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>> Benchmark                                        Mode
> Cnt   Score
> >>>>>>>> > >> > >>>  Error
> >>>>>>>> > >> > >>>  Units
> >>>>>>>> > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>   5   3.821 ±
> >>>>>>>> > >> > >>> 0.031
> >>>>>>>> > >> > >>>  ns/op
> >>>>>>>> > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>   5  14.884 ±
> >>>>>>>> > >> > >>> 0.141
> >>>>>>>> > >> > >>>  ns/op
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>> I applied the same pattern to other loops that I could
> find, and for
> >>>>>>>> > >> > any
> >>>>>>>> > >> > >>> "for (long" loop on the critical path, I broke it up
> into two loops.
> >>>>>>>> > >> > the
> >>>>>>>> > >> > >>> first loop does iteration by integer, the second
> finishes off for
> >>>>>>>> > >> any
> >>>>>>>> > >> > >>> long
> >>>>>>>> > >> > >>> values.  As a side note it seems like optimization for
> loops using
> >>>>>>>> > >> long
> >>>>>>>> > >> > >>> counters at least have a semi-recent open bug for the
> JVM [2]
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>> Thanks,
> >>>>>>>> > >> > >>> Micah
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>> [1]
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> >
> >>>>>>>> > >>
> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797
> >>>>>>>> > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <
> >>>>>>>> > >> emkornfi...@gmail.com>
> >>>>>>>> > >> > >>> wrote:
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>> > Indeed, the BoundChecking and CheckNullForGet
> variables can make a
> >>>>>>>> > >> > big
> >>>>>>>> > >> > >>> > difference.  I didn't initially run the benchmarks
> with these
> >>>>>>>> > >> turned
> >>>>>>>> > >> > on
> >>>>>>>> > >> > >>> > (you can see the result from above with
> Float8Benchmarks).  Here
> >>>>>>>> > >> are
> >>>>>>>> > >> > >>> new
> >>>>>>>> > >> > >>> > numbers including with the flags enabled.  It looks
> like using
> >>>>>>>> > >> longs
> >>>>>>>> > >> > >>> might
> >>>>>>>> > >> > >>> > be a little bit slower, I'll see what I can do to
> mitigate this.
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > Ravindra also volunteered to try to benchmark the
> changes with
> >>>>>>>> > >> > Dremio's
> >>>>>>>> > >> > >>> > code on today's sync call.
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > New
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > Benchmark
> Mode  Cnt   Score
> >>>>>>>> > >> > >>>  Error
> >>>>>>>> > >> > >>> > Units
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark
>  avgt    5
> >>>>>>>> > >>  4.176 ±
> >>>>>>>> > >> > >>> 1.292
> >>>>>>>> > >> > >>> > ns/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark
> avgt    5
> >>>>>>>> > >> 26.102 ±
> >>>>>>>> > >> > >>> 0.700
> >>>>>>>> > >> > >>> > ns/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5
> 7.398 ± 0.084
> >>>>>>>> > >> us/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5
> 2.711 ± 0.057
> >>>>>>>> > >> us/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > old
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark
>  avgt    5
> >>>>>>>> > >>  3.828 ±
> >>>>>>>> > >> > >>> 0.030
> >>>>>>>> > >> > >>> > ns/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark
> avgt    5
> >>>>>>>> > >> 20.611 ±
> >>>>>>>> > >> > >>> 0.188
> >>>>>>>> > >> > >>> > ns/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5
> 6.597 ± 0.462
> >>>>>>>> > >> us/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5
> 2.615 ± 0.027
> >>>>>>>> > >> us/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <
> liya.fa...@gmail.com>
> >>>>>>>> > >> > wrote:
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> >> Hi Gonzalo,
> >>>>>>>> > >> > >>> >>
> >>>>>>>> > >> > >>> >> Thanks for sharing the performance results.
> >>>>>>>> > >> > >>> >> I am wondering if you have turned off the flag
> >>>>>>>> > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED.
> >>>>>>>> > >> > >>> >> If not, the lower throughput should be expected.
> >>>>>>>> > >> > >>> >>
> >>>>>>>> > >> > >>> >> Best,
> >>>>>>>> > >> > >>> >> Liya Fan
> >>>>>>>> > >> > >>> >>
> >>>>>>>> > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <
> >>>>>>>> > >> > >>> emkornfi...@gmail.com>
> >>>>>>>> > >> > >>> >> wrote:
> >>>>>>>> > >> > >>> >>
> >>>>>>>> > >> > >>> >>> Hi Gonzalo,
> >>>>>>>> > >> > >>> >>> Thank you for the feedback.  I wasn't aware of the
> JIT
> >>>>>>>> > >> > >>> implications.   At
> >>>>>>>> > >> > >>> >>> least on the benchmark run they don't seem to have
> an impact.
> >>>>>>>> > >> > >>> >>>
> >>>>>>>> > >> > >>> >>> If there are other benchmarks that people have
> that can
> >>>>>>>> > >> validate if
> >>>>>>>> > >> > >>> this
> >>>>>>>> > >> > >>> >>> change will be problematic I would appreciate
> trying to run them
> >>>>>>>> > >> > >>> with the
> >>>>>>>> > >> > >>> >>> PR.  I will try to run the ones for zeroing/popcnt
> tonight to
> >>>>>>>> > >> see
> >>>>>>>> > >> > if
> >>>>>>>> > >> > >>> >>> there
> >>>>>>>> > >> > >>> >>> is a change in those.
> >>>>>>>> > >> > >>> >>>
> >>>>>>>> > >> > >>> >>> -Micah
> >>>>>>>> > >> > >>> >>>
> >>>>>>>> > >> > >>> >>>
> >>>>>>>> > >> > >>> >>>
> >>>>>>>> > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz
> Jaureguizar <
> >>>>>>>> > >> > >>> >>> golthir...@gmail.com> wrote:
> >>>>>>>> > >> > >>> >>>
> >>>>>>>> > >> > >>> >>> > I would recommend to take care with this kind of
> changes.
> >>>>>>>> > >> > >>> >>> >
> >>>>>>>> > >> > >>> >>> > I didn't try Arrow in more than one year, but by
> then the
> >>>>>>>> > >> > >>> performance
> >>>>>>>> > >> > >>> >>> was
> >>>>>>>> > >> > >>> >>> > quite bad in comparison with plain byte buffer
> access
> >>>>>>>> > >> > >>> >>> > (see
> http://git.net/apache-arrow-development/msg02353.html *)
> >>>>>>>> > >> > and
> >>>>>>>> > >> > >>> >>> > there are several optimizations that the JVM
> (specifically,
> >>>>>>>> > >> C2)
> >>>>>>>> > >> > >>> does
> >>>>>>>> > >> > >>> >>> not
> >>>>>>>> > >> > >>> >>> > apply when dealing with int instead of longs.
> One of the
> >>>>>>>> > >> > >>> >>> > most commons is the loop unrolling and
> vectorization.
> >>>>>>>> > >> > >>> >>> >
> >>>>>>>> > >> > >>> >>> > * It doesn't seem the best way to reference an
> old email on
> >>>>>>>> > >> the
> >>>>>>>> > >> > >>> list,
> >>>>>>>> > >> > >>> >>> but
> >>>>>>>> > >> > >>> >>> > it is the only result shown by Google
> >>>>>>>> > >> > >>> >>> >
> >>>>>>>> > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<
> >>>>>>>> > >> > liya.fa...@gmail.com
> >>>>>>>> > >> > >>> >)
> >>>>>>>> > >> > >>> >>> > escribió:
> >>>>>>>> > >> > >>> >>> >
> >>>>>>>> > >> > >>> >>> >> Hi Micah,
> >>>>>>>> > >> > >>> >>> >>
> >>>>>>>> > >> > >>> >>> >> Thanks for your effort. The performance result
> looks good.
> >>>>>>>> > >> > >>> >>> >>
> >>>>>>>> > >> > >>> >>> >> As you indicated, ArrowBuf will take additional
> 12 bytes (4
> >>>>>>>> > >> > bytes
> >>>>>>>> > >> > >>> for
> >>>>>>>> > >> > >>> >>> each
> >>>>>>>> > >> > >>> >>> >> of length, write index, and read index).
> >>>>>>>> > >> > >>> >>> >> Similar overheads also exist for vectors like
> >>>>>>>> > >> > >>> BaseFixedWidthVector,
> >>>>>>>> > >> > >>> >>> >> BaseVariableWidthVector, etc.
> >>>>>>>> > >> > >>> >>> >>
> >>>>>>>> > >> > >>> >>> >> IMO, such overheads are small enough to justify
> the change.
> >>>>>>>> > >> > >>> >>> >> Let's check if there are other overheads.
> >>>>>>>> > >> > >>> >>> >>
> >>>>>>>> > >> > >>> >>> >> Best,
> >>>>>>>> > >> > >>> >>> >> Liya Fan
> >>>>>>>> > >> > >>> >>> >>
> >>>>>>>> > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield <
> >>>>>>>> > >> > >>> emkornfi...@gmail.com
> >>>>>>>> > >> > >>> >>> >
> >>>>>>>> > >> > >>> >>> >> wrote:
> >>>>>>>> > >> > >>> >>> >>
> >>>>>>>> > >> > >>> >>> >> > Hi Liya Fan,
> >>>>>>>> > >> > >>> >>> >> > Based on the Float8Benchmark there does not
> seem to be any
> >>>>>>>> > >> > >>> >>> meaningful
> >>>>>>>> > >> > >>> >>> >> > performance difference on my machine.  At
> least for me, the
> >>>>>>>> > >> > >>> >>> benchmarks
> >>>>>>>> > >> > >>> >>> >> are
> >>>>>>>> > >> > >>> >>> >> > not stable enough to say one is faster than
> the other (I've
> >>>>>>>> > >> > >>> pasted
> >>>>>>>> > >> > >>> >>> >> results
> >>>>>>>> > >> > >>> >>> >> > below).  That being said my machine isn't
> necessarily the
> >>>>>>>> > >> most
> >>>>>>>> > >> > >>> >>> reliable
> >>>>>>>> > >> > >>> >>> >> for
> >>>>>>>> > >> > >>> >>> >> > benchmarking.
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> > On an intuitive level, this makes sense to
> me,  for the
> >>>>>>>> > >> most
> >>>>>>>> > >> > >>> part it
> >>>>>>>> > >> > >>> >>> >> seems
> >>>>>>>> > >> > >>> >>> >> > like the change just moves casting from "int"
> to "long"
> >>>>>>>> > >> > further
> >>>>>>>> > >> > >>> up
> >>>>>>>> > >> > >>> >>> the
> >>>>>>>> > >> > >>> >>> >> > stack  for  "PlatformDepdendent" operations.
> If there are
> >>>>>>>> > >> > other
> >>>>>>>> > >> > >>> >>> >> benchmarks
> >>>>>>>> > >> > >>> >>> >> > that you think are worth running let me know.
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> > One downside performance wise I think for his
> change is it
> >>>>>>>> > >> > >>> >>> increases the
> >>>>>>>> > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose
> could influence
> >>>>>>>> > >> > cache
> >>>>>>>> > >> > >>> >>> misses
> >>>>>>>> > >> > >>> >>> >> at
> >>>>>>>> > >> > >>> >>> >> > some level or increase the size of
> call-stacks, but this
> >>>>>>>> > >> > doesn't
> >>>>>>>> > >> > >>> >>> seem to
> >>>>>>>> > >> > >>> >>> >> > show up in the benchmark..
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> > Thanks,
> >>>>>>>> > >> > >>> >>> >> > Micah
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> > Sample benchmark numbers:
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> > [New Code]
> >>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode
> Cnt   Score
> >>>>>>>> > >>  Error
> >>>>>>>> > >> > >>> >>> Units
> >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt
> 5  15.441 ±
> >>>>>>>> > >> 0.469
> >>>>>>>> > >> > >>> >>> us/op
> >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt
> 5  14.057 ±
> >>>>>>>> > >> 0.115
> >>>>>>>> > >> > >>> >>> us/op
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> > [Old code]
> >>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode
> Cnt   Score
> >>>>>>>> > >>  Error
> >>>>>>>> > >> > >>> >>> Units
> >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt
> 5  16.248 ±
> >>>>>>>> > >> 1.409
> >>>>>>>> > >> > >>> >>> us/op
> >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt
> 5  14.150 ±
> >>>>>>>> > >> 0.084
> >>>>>>>> > >> > >>> >>> us/op
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <
> >>>>>>>> > >> liya.fa...@gmail.com
> >>>>>>>> > >> > >
> >>>>>>>> > >> > >>> >>> wrote:
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> >> Hi Micah,
> >>>>>>>> > >> > >>> >>> >> >>
> >>>>>>>> > >> > >>> >>> >> >> Thanks a lot for doing this.
> >>>>>>>> > >> > >>> >>> >> >>
> >>>>>>>> > >> > >>> >>> >> >> I am a little concerned about if there is
> any negative
> >>>>>>>> > >> > >>> performance
> >>>>>>>> > >> > >>> >>> >> impact
> >>>>>>>> > >> > >>> >>> >> >> on the current 32-bit-length based
> applications.
> >>>>>>>> > >> > >>> >>> >> >> Can we do some performance comparison on our
> existing
> >>>>>>>> > >> > >>> benchmarks?
> >>>>>>>> > >> > >>> >>> >> >>
> >>>>>>>> > >> > >>> >>> >> >> Best,
> >>>>>>>> > >> > >>> >>> >> >> Liya Fan
> >>>>>>>> > >> > >>> >>> >> >>
> >>>>>>>> > >> > >>> >>> >> >>
> >>>>>>>> > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah
> Kornfield <
> >>>>>>>> > >> > >>> >>> emkornfi...@gmail.com>
> >>>>>>>> > >> > >>> >>> >> >> wrote:
> >>>>>>>> > >> > >>> >>> >> >>
> >>>>>>>> > >> > >>> >>> >> >>> There have been some previous discussions
> on the mailing
> >>>>>>>> > >> > about
> >>>>>>>> > >> > >>> >>> >> supporting
> >>>>>>>> > >> > >>> >>> >> >>> 64-bit lengths for  Java ValueVectors (this
> is what the
> >>>>>>>> > >> IPC
> >>>>>>>> > >> > >>> >>> >> specification
> >>>>>>>> > >> > >>> >>> >> >>> and C++ support).  I created a PR [1] that
> changes all
> >>>>>>>> > >> APIs
> >>>>>>>> > >> > >>> that I
> >>>>>>>> > >> > >>> >>> >> could
> >>>>>>>> > >> > >>> >>> >> >>> find that take an index to take an "long"
> instead of an
> >>>>>>>> > >> > "int"
> >>>>>>>> > >> > >>> (and
> >>>>>>>> > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs).
> >>>>>>>> > >> > >>> >>> >> >>>
> >>>>>>>> > >> > >>> >>> >> >>> It is a big change, so I think it is worth
> discussing if
> >>>>>>>> > >> it
> >>>>>>>> > >> > is
> >>>>>>>> > >> > >>> >>> >> something
> >>>>>>>> > >> > >>> >>> >> >>> we
> >>>>>>>> > >> > >>> >>> >> >>> still want to move forward with.  It would
> be nice to
> >>>>>>>> > >> come
> >>>>>>>> > >> > to
> >>>>>>>> > >> > >>> a
> >>>>>>>> > >> > >>> >>> >> >>> conclusion
> >>>>>>>> > >> > >>> >>> >> >>> quickly, ideally in the next few days, to
> avoid a lot of
> >>>>>>>> > >> > merge
> >>>>>>>> > >> > >>> >>> >> conflicts.
> >>>>>>>> > >> > >>> >>> >> >>>
> >>>>>>>> > >> > >>> >>> >> >>> The reason I did this work now is the C++
> implementation
> >>>>>>>> > >> has
> >>>>>>>> > >> > >>> added
> >>>>>>>> > >> > >>> >>> >> >>> support
> >>>>>>>> > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString
> arrays and
> >>>>>>>> > >> based
> >>>>>>>> > >> > on
> >>>>>>>> > >> > >>> >>> prior
> >>>>>>>> > >> > >>> >>> >> >>> discussions we need to have similar support
> in Java
> >>>>>>>> > >> before
> >>>>>>>> > >> > our
> >>>>>>>> > >> > >>> >>> next
> >>>>>>>> > >> > >>> >>> >> >>> release. Support 64-bit indexes means we
> can have full
> >>>>>>>> > >> > >>> >>> compatibility
> >>>>>>>> > >> > >>> >>> >> and
> >>>>>>>> > >> > >>> >>> >> >>> make the most use of the types in Java.
> >>>>>>>> > >> > >>> >>> >> >>>
> >>>>>>>> > >> > >>> >>> >> >>> Look forward to hearing feedback.
> >>>>>>>> > >> > >>> >>> >> >>>
> >>>>>>>> > >> > >>> >>> >> >>> Thanks,
> >>>>>>>> > >> > >>> >>> >> >>> Micah
> >>>>>>>> > >> > >>> >>> >> >>>
> >>>>>>>> > >> > >>> >>> >> >>> [1]
> https://github.com/apache/arrow/pull/5020
> >>>>>>>> > >> > >>> >>> >> >>>
> >>>>>>>> > >> > >>> >>> >> >>
> >>>>>>>> > >> > >>> >>> >>
> >>>>>>>> > >> > >>> >>> >
> >>>>>>>> > >> > >>> >>>
> >>>>>>>> > >> > >>> >>
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>
> >>>>>>>> > >> >
> >>>>>>>> > >>
> >>>>>>>> > >
>

Re: [Discuss][Java] 64-bit lengths for ValueVectors

Reply via email to