Re: [Discuss][Java] 64-bit lengths for ValueVectors

Micah Kornfield Fri, 15 Nov 2019 01:08:07 -0800

Apologies for the long delay, I chose to do the minimal work of limiting
this change [1] to allowing ArrowBuf to 64-bit lengths.  This would unblock
work on LargeString and LargeBinary.  If this change looks OK, I think
there is some follow-up work to add more thorough unit/integration tests.


As an aside, it does seem like the 2GB limit is affecting some users in
Spark [2][3], so hopefully LargeString would help with this.

Allowing more than MAX_INT elements is Vectors/array still a blocker for
making LargeList useful.

Thanks,
Micah

[1]  https://github.com/apache/arrow/pull/5020
[2]
https://stackoverflow.com/questions/58739888/spark-is-it-possible-to-increase-pyarrow-buffer#comment103812119_58739888
[3] https://issues.apache.org/jira/browse/ARROW-4890

On Fri, Aug 23, 2019 at 8:33 AM Jacques Nadeau <jacq...@apache.org> wrote:

>
>
> On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
>> The vector indexes being limited to 32 bits doesn't limit the addressing
>>> to 32 bit chunks of memory. For example, you're prime example before was
>>> image data. Having 2 billion images of 1mb images would still be supported
>>> without changing the index addressing.
>>
>> This might be pre-coffee math, but I think we are limited to
>> approximately 2000 images because an ArrowBuf only holds up-to 2 billion
>> bytes [1].  While we have plenty of room for the offsets, we quickly run
>> out of contiguous data storage space. For LargeString and LargeBinary this
>> could be fixed by changing ArrowBuf.
>>
>> LargeArray faces the same problem only it applies to its child vectors.
>> Supporting LargeArray properly is really what drove the large-scale
>> interface change.
>>
>
> My expressed concern about these changes was specifically about the use of
> long for get/set in the vector interfaces. I'm not saying that we constrain
> memory/ArrowBufs to 32bits.
>
>
>> Rebase would help if possible.
>>
>> I'll try to get to this in the next few days.
>>
>> [1]
>> https://github.com/apache/arrow/blob/95175fe7cb8439eebe6d2f6e0495f551d6864380/java/memory/src/main/java/io/netty/buffer/ArrowBuf.java#L164
>>
>> On Fri, Aug 23, 2019 at 4:55 AM Jacques Nadeau <jacq...@apache.org>
>> wrote:
>>
>>>
>>>
>>> On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield <emkornfi...@gmail.com>
>>> wrote:
>>>
>>>> I don't think we should couple this discussion with the implementation
>>>>> of large list, etc since I think those two concepts are independent.
>>>>
>>>> I'm still trying to balance in my mind which is a worse experience for
>>>> consumers of the libraries for these types.  Claiming that Java supports
>>>> these types but throwing an exception when the Vectors exceed 32-bits or
>>>> just say they aren't supported until we have 64-bit support in Java.
>>>>
>>>
>>> The vector indexes being limited to 32 bits doesn't limit the addressing
>>> to 32 bit chunks of memory. For example, you're prime example before was
>>> image data. Having 2 billion images of 1mb images would still be supported
>>> without changing the index addressing.
>>>
>>>
>>>
>>>>
>>>>> I've asked some others on my team their opinions on the risk here. I
>>>>> think we should probably review some our more complex vector interactions
>>>>> and see how the jvm's assembly changes with this kind of change. Using
>>>>> microbenchmarking is good but I think we also need to see whether we're
>>>>> constantly inserting additional instructions or if in most cases, this
>>>>> actually doesn't impact instruction count.
>>>>
>>>>
>>>> Is this something that your team will take on?
>>>>
>>>
>>>
>>> Yeah, we need to look at this I think.
>>>
>>> Do you need a rebased version of the PR or is the existing one
>>>> sufficient?
>>>>
>>>
>>> Rebase would help if possible.
>>>
>>>
>>>
>>>> Thanks,
>>>> Micah
>>>>
>>>> On Thu, Aug 22, 2019 at 8:56 PM Jacques Nadeau <jacq...@apache.org>
>>>> wrote:
>>>>
>>>>> I don't think we should couple this discussion with the implementation
>>>>> of large list, etc since I think those two concepts are independent.
>>>>>
>>>>> I've asked some others on my team their opinions on the risk here. I
>>>>> think we should probably review some our more complex vector interactions
>>>>> and see how the jvm's assembly changes with this kind of change. Using
>>>>> microbenchmarking is good but I think we also need to see whether we're
>>>>> constantly inserting additional instructions or if in most cases, this
>>>>> actually doesn't impact instruction count.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 21, 2019 at 12:18 PM Micah Kornfield <
>>>>> emkornfi...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>> With regards to the reference implementation point. It is a good
>>>>>>> point. I'm on vacation this week. Unless you're pushing hard on this, 
>>>>>>> can
>>>>>>> we pick this up and discuss more next week?
>>>>>>
>>>>>>
>>>>>> Hi Jacques, I hope you had a good rest.  Any more thoughts on the
>>>>>> reference implementation aspect of this?
>>>>>>
>>>>>>
>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>>>>> would be best to decouple this discussion from the release timeline
>>>>>>> given how many people we have relying on regular releases coming out.
>>>>>>> We can keep continue making major 0.x releases until we're ready to
>>>>>>> release 1.0.0.
>>>>>>
>>>>>>
>>>>>> I'm OK with it as long as other stakeholders are. Timed releases are
>>>>>> the way to go.  As stated on the release thread [1] we need a better
>>>>>> mechanism to avoid this type of issue arising again.  The release thread
>>>>>> also had some more discussion on compatibility.
>>>>>>
>>>>>> Thanks,
>>>>>> Micah
>>>>>>
>>>>>> [1]
>>>>>> https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <wesmck...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield <
>>>>>>> emkornfi...@gmail.com> wrote:
>>>>>>> >
>>>>>>> > Hi Wes and Jacques,
>>>>>>> > See responses below.
>>>>>>> >
>>>>>>> > With regards to the reference implementation point. It is a good
>>>>>>> point. I'm
>>>>>>> > > on vacation this week. Unless you're pushing hard on this, can
>>>>>>> we pick this
>>>>>>> > > up and discuss more next week?
>>>>>>> >
>>>>>>> >
>>>>>>> > Sure thing, enjoy your vacation.  I think the only practical
>>>>>>> implications
>>>>>>> > are it delays choices around implementing LargeList, LargeBinary,
>>>>>>> > LargeString in Java, which in turn might push out the 0.15.0
>>>>>>> release.
>>>>>>> >
>>>>>>>
>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>>>>> would be best to decouple this discussion from the release timeline
>>>>>>> given how many people we have relying on regular releases coming out.
>>>>>>> We can keep continue making major 0.x releases until we're ready to
>>>>>>> release 1.0.0.
>>>>>>>
>>>>>>> > My stance on this is that I don't know how important it is for
>>>>>>> Java to
>>>>>>> > > support vectors over INT32_MAX elements. The use cases enabled by
>>>>>>> > > having very large arrays seem to be concentrated in the native
>>>>>>> code
>>>>>>> > > world (e.g. C/C++/Rust) -- that could just be
>>>>>>> implementation-centrism
>>>>>>> > > on my part, though.
>>>>>>> >
>>>>>>> >
>>>>>>> > A data point against this view is Spark has done work to eliminate
>>>>>>> 2GB
>>>>>>> > memory limits on its block sizes [1].  I don't claim to understand
>>>>>>> the
>>>>>>> > implications of this. Bryan might you have any thoughts here?  I'm
>>>>>>> OK with
>>>>>>> > INT32_MAX, as well, I think we should think about what this means
>>>>>>> for
>>>>>>> > adding Large types to Java and implications for reference
>>>>>>> implementations.
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Micah
>>>>>>> >
>>>>>>> > [1] https://issues.apache.org/jira/browse/SPARK-6235
>>>>>>> >
>>>>>>> > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau <jacq...@apache.org>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > > Hey Micah,
>>>>>>> > >
>>>>>>> > > Appreciate the offer on the compiling. The reality is I'm more
>>>>>>> concerned
>>>>>>> > > about the unknowns than the compiling issue itself. Any time
>>>>>>> you've been
>>>>>>> > > tuning for a while, changing something like this could be
>>>>>>> totally fine or
>>>>>>> > > cause a couple of major issues. For example, we've done a very
>>>>>>> large amount
>>>>>>> > > of work reducing heap memory footprint of the vectors. Are
>>>>>>> target is to
>>>>>>> > > actually get it down to 24 bytes per ArrowBuf and 24 bytes heap
>>>>>>> per vector
>>>>>>> > > (not including arrow bufs).
>>>>>>> > >
>>>>>>> > > With regards to the reference implementation point. It is a good
>>>>>>> point.
>>>>>>> > > I'm on vacation this week. Unless you're pushing hard on this,
>>>>>>> can we pick
>>>>>>> > > this up and discuss more next week?
>>>>>>> > >
>>>>>>> > > thanks,
>>>>>>> > > Jacques
>>>>>>> > >
>>>>>>> > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield <
>>>>>>> emkornfi...@gmail.com>
>>>>>>> > > wrote:
>>>>>>> > >
>>>>>>> > >> Hi Jacques,
>>>>>>> > >> I definitely understand these concerns and this change is risky
>>>>>>> because it
>>>>>>> > >> is so large.  Perhaps, creating a new hierarchy, might be the
>>>>>>> cleanest way
>>>>>>> > >> of dealing with this.  This could have other benefits like
>>>>>>> cleaning up
>>>>>>> > >> some
>>>>>>> > >> cruft around dictionary encode and "orphaned" method.   Per
>>>>>>> past e-mail
>>>>>>> > >> threads I agree it is beneficial to have 2 separate reference
>>>>>>> > >> implementations that can communicate fully, and my intent here
>>>>>>> was to
>>>>>>> > >> close
>>>>>>> > >> that gap.
>>>>>>> > >>
>>>>>>> > >> Trying to
>>>>>>> > >> > determine the ramifications of these changes would be
>>>>>>> challenging and
>>>>>>> > >> time
>>>>>>> > >> > consuming against all the different ways we interact with the
>>>>>>> Arrow Java
>>>>>>> > >> > library.
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > >> Understood.  I took a quick look at Dremio-OSS it seems like it
>>>>>>> has a
>>>>>>> > >> simple java build system?  If it is helpful, I can try to get a
>>>>>>> fork
>>>>>>> > >> running that at least compiles against this PR.  My plan would
>>>>>>> be to cast
>>>>>>> > >> any place that was changed to return a long back to an int, so
>>>>>>> in essence
>>>>>>> > >> the Dremio algorithms would reman 32-bit implementations.
>>>>>>> > >>
>>>>>>> > >> I don't  have the infrastructure to test this change properly
>>>>>>> from a
>>>>>>> > >> distributed systems perspective, so it would still take some
>>>>>>> time from
>>>>>>> > >> Dremio to validate for regressions.
>>>>>>> > >>
>>>>>>> > >> I'm not saying I'm against this but want to make sure we've
>>>>>>> > >> > explored all less disruptive options before considering
>>>>>>> changing
>>>>>>> > >> something
>>>>>>> > >> > this fundamental (especially when I generally hold the view
>>>>>>> that large
>>>>>>> > >> cell
>>>>>>> > >> > counts against massive contiguous memory is an anti pattern
>>>>>>> to scalable
>>>>>>> > >> > analytical processing--purely subjective of course).
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > >> I'm open to other ideas here, as well. I don't think it is out
>>>>>>> of the
>>>>>>> > >> question to leave the Java implementation as 32-bit, but if we
>>>>>>> do, then I
>>>>>>> > >> think we should consider a different strategy for reference
>>>>>>> > >> implementations.
>>>>>>> > >>
>>>>>>> > >> Thanks,
>>>>>>> > >> Micah
>>>>>>> > >>
>>>>>>> > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau <
>>>>>>> jacq...@apache.org>
>>>>>>> > >> wrote:
>>>>>>> > >>
>>>>>>> > >> > Hey Micah, I didn't have a particular path in mind. Was
>>>>>>> thinking more
>>>>>>> > >> along
>>>>>>> > >> > the lines of extra methods as opposed to separate classes.
>>>>>>> > >> >
>>>>>>> > >> > Arrow hasn't historically been a place where we're writing
>>>>>>> algorithms in
>>>>>>> > >> > Java so the fact that they aren't there doesn't mean they
>>>>>>> don't exist.
>>>>>>> > >> We
>>>>>>> > >> > have a large amount of code that depends on the current
>>>>>>> behavior that is
>>>>>>> > >> > deployed in hundreds of customer clusters (you can peruse our
>>>>>>> dremio
>>>>>>> > >> repo
>>>>>>> > >> > to see how extensively we leverage Arrow if interested).
>>>>>>> Trying to
>>>>>>> > >> > determine the ramifications of these changes would be
>>>>>>> challenging and
>>>>>>> > >> time
>>>>>>> > >> > consuming against all the different ways we interact with the
>>>>>>> Arrow Java
>>>>>>> > >> > library. I'm not saying I'm against this but want to make
>>>>>>> sure we've
>>>>>>> > >> > explored all less disruptive options before considering
>>>>>>> changing
>>>>>>> > >> something
>>>>>>> > >> > this fundamental (especially when I generally hold the view
>>>>>>> that large
>>>>>>> > >> cell
>>>>>>> > >> > counts against massive contiguous memory is an anti pattern
>>>>>>> to scalable
>>>>>>> > >> > analytical processing--purely subjective of course).
>>>>>>> > >> >
>>>>>>> > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield <
>>>>>>> emkornfi...@gmail.com>
>>>>>>> > >> > wrote:
>>>>>>> > >> >
>>>>>>> > >> > > Hi Jacques,
>>>>>>> > >> > > What avenue were you thinking for supporting both paths?
>>>>>>>  I didn't
>>>>>>> > >> want
>>>>>>> > >> > > to pursue a different class hierarchy, because I felt like
>>>>>>> that would
>>>>>>> > >> > > effectively fork the code base, but that is potentially an
>>>>>>> option that
>>>>>>> > >> > > would allow us to have a complete reference implementation
>>>>>>> in Java
>>>>>>> > >> that
>>>>>>> > >> > can
>>>>>>> > >> > > fully interact with C++, without major changes to this code.
>>>>>>> > >> > >
>>>>>>> > >> > > For supporting both APIs on the same classes/interfaces, I
>>>>>>> think they
>>>>>>> > >> > > roughly fall into three categories, changes to input
>>>>>>> parameters,
>>>>>>> > >> changes
>>>>>>> > >> > to
>>>>>>> > >> > > output parameters and algorithm changes.
>>>>>>> > >> > >
>>>>>>> > >> > > For inputs, changing from int to long is essentially a
>>>>>>> no-op from the
>>>>>>> > >> > > compiler perspective.  From the limited micro-benchmarking
>>>>>>> this also
>>>>>>> > >> > > doesn't seem to have a performance impact.  So we could
>>>>>>> keep two
>>>>>>> > >> versions
>>>>>>> > >> > > of the methods that only differ on inputs, but it is not
>>>>>>> clear what
>>>>>>> > >> the
>>>>>>> > >> > > value of that would be.
>>>>>>> > >> > >
>>>>>>> > >> > > For outputs, we can't support methods "long getLength()"
>>>>>>> and "int
>>>>>>> > >> > > getLength()" in the same class, so we would be forced into
>>>>>>> something
>>>>>>> > >> like
>>>>>>> > >> > > "long getLength(boolean dummy)" which I think is a less
>>>>>>> desirable.
>>>>>>> > >> > >
>>>>>>> > >> > > For algorithm changes, there did not appear to be too many
>>>>>>> places
>>>>>>> > >> where
>>>>>>> > >> > we
>>>>>>> > >> > > actually loop over all elements (it is quite possible I
>>>>>>> missed
>>>>>>> > >> something
>>>>>>> > >> > > here), the ones that I did find I was able to mitigate
>>>>>>> performance
>>>>>>> > >> > > penalties as noted above.  Some of the current
>>>>>>> implementation will
>>>>>>> > >> get a
>>>>>>> > >> > > lot slower for "large arrays", but we can likely fix those
>>>>>>> later or in
>>>>>>> > >> > this
>>>>>>> > >> > > PR with a nested while loop instead of 2 for loops.
>>>>>>> > >> > >
>>>>>>> > >> > > Thanks,
>>>>>>> > >> > > Micah
>>>>>>> > >> > >
>>>>>>> > >> > >
>>>>>>> > >> > > On Saturday, August 10, 2019, Jacques Nadeau <
>>>>>>> jacq...@apache.org>
>>>>>>> > >> wrote:
>>>>>>> > >> > >
>>>>>>> > >> > >> This is a pretty massive change to the apis. I wonder how
>>>>>>> nasty it
>>>>>>> > >> would
>>>>>>> > >> > >> be to just support both paths. Have you evaluated how
>>>>>>> complex that
>>>>>>> > >> > would be?
>>>>>>> > >> > >>
>>>>>>> > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield <
>>>>>>> > >> emkornfi...@gmail.com>
>>>>>>> > >> > >> wrote:
>>>>>>> > >> > >>
>>>>>>> > >> > >>> After more investigation, it looks like Float8Benchmarks
>>>>>>> at least
>>>>>>> > >> on my
>>>>>>> > >> > >>> machine are within the range of noise.
>>>>>>> > >> > >>>
>>>>>>> > >> > >>> For BitVectorHelper I pushed a new commit [1], seems to
>>>>>>> bring the
>>>>>>> > >> > >>> BitVectorHelper benchmarks back inline (and even with some
>>>>>>> > >> improvement
>>>>>>> > >> > >>> for
>>>>>>> > >> > >>> getNullCountBenchmark).
>>>>>>> > >> > >>>
>>>>>>> > >> > >>> Benchmark                                        Mode
>>>>>>> Cnt   Score
>>>>>>> > >> > >>>  Error
>>>>>>> > >> > >>>  Units
>>>>>>> > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>>> 5   3.821 ±
>>>>>>> > >> > >>> 0.031
>>>>>>> > >> > >>>  ns/op
>>>>>>> > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>>> 5  14.884 ±
>>>>>>> > >> > >>> 0.141
>>>>>>> > >> > >>>  ns/op
>>>>>>> > >> > >>>
>>>>>>> > >> > >>> I applied the same pattern to other loops that I could
>>>>>>> find, and for
>>>>>>> > >> > any
>>>>>>> > >> > >>> "for (long" loop on the critical path, I broke it up into
>>>>>>> two loops.
>>>>>>> > >> > the
>>>>>>> > >> > >>> first loop does iteration by integer, the second finishes
>>>>>>> off for
>>>>>>> > >> any
>>>>>>> > >> > >>> long
>>>>>>> > >> > >>> values.  As a side note it seems like optimization for
>>>>>>> loops using
>>>>>>> > >> long
>>>>>>> > >> > >>> counters at least have a semi-recent open bug for the JVM
>>>>>>> [2]
>>>>>>> > >> > >>>
>>>>>>> > >> > >>> Thanks,
>>>>>>> > >> > >>> Micah
>>>>>>> > >> > >>>
>>>>>>> > >> > >>> [1]
>>>>>>> > >> > >>>
>>>>>>> > >> > >>>
>>>>>>> > >> >
>>>>>>> > >>
>>>>>>> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797
>>>>>>> > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051
>>>>>>> > >> > >>>
>>>>>>> > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <
>>>>>>> > >> emkornfi...@gmail.com>
>>>>>>> > >> > >>> wrote:
>>>>>>> > >> > >>>
>>>>>>> > >> > >>> > Indeed, the BoundChecking and CheckNullForGet variables
>>>>>>> can make a
>>>>>>> > >> > big
>>>>>>> > >> > >>> > difference.  I didn't initially run the benchmarks with
>>>>>>> these
>>>>>>> > >> turned
>>>>>>> > >> > on
>>>>>>> > >> > >>> > (you can see the result from above with
>>>>>>> Float8Benchmarks).  Here
>>>>>>> > >> are
>>>>>>> > >> > >>> new
>>>>>>> > >> > >>> > numbers including with the flags enabled.  It looks
>>>>>>> like using
>>>>>>> > >> longs
>>>>>>> > >> > >>> might
>>>>>>> > >> > >>> > be a little bit slower, I'll see what I can do to
>>>>>>> mitigate this.
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > Ravindra also volunteered to try to benchmark the
>>>>>>> changes with
>>>>>>> > >> > Dremio's
>>>>>>> > >> > >>> > code on today's sync call.
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > New
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > Benchmark                                        Mode
>>>>>>> Cnt   Score
>>>>>>> > >> > >>>  Error
>>>>>>> > >> > >>> > Units
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>>>   5
>>>>>>> > >>  4.176 ±
>>>>>>> > >> > >>> 1.292
>>>>>>> > >> > >>> > ns/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>>>   5
>>>>>>> > >> 26.102 ±
>>>>>>> > >> > >>> 0.700
>>>>>>> > >> > >>> > ns/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  7.398 ±
>>>>>>> 0.084
>>>>>>> > >> us/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.711 ±
>>>>>>> 0.057
>>>>>>> > >> us/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > old
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>>>   5
>>>>>>> > >>  3.828 ±
>>>>>>> > >> > >>> 0.030
>>>>>>> > >> > >>> > ns/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>>>   5
>>>>>>> > >> 20.611 ±
>>>>>>> > >> > >>> 0.188
>>>>>>> > >> > >>> > ns/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  6.597 ±
>>>>>>> 0.462
>>>>>>> > >> us/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.615 ±
>>>>>>> 0.027
>>>>>>> > >> us/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <
>>>>>>> liya.fa...@gmail.com>
>>>>>>> > >> > wrote:
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> >> Hi Gonzalo,
>>>>>>> > >> > >>> >>
>>>>>>> > >> > >>> >> Thanks for sharing the performance results.
>>>>>>> > >> > >>> >> I am wondering if you have turned off the flag
>>>>>>> > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED.
>>>>>>> > >> > >>> >> If not, the lower throughput should be expected.
>>>>>>> > >> > >>> >>
>>>>>>> > >> > >>> >> Best,
>>>>>>> > >> > >>> >> Liya Fan
>>>>>>> > >> > >>> >>
>>>>>>> > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <
>>>>>>> > >> > >>> emkornfi...@gmail.com>
>>>>>>> > >> > >>> >> wrote:
>>>>>>> > >> > >>> >>
>>>>>>> > >> > >>> >>> Hi Gonzalo,
>>>>>>> > >> > >>> >>> Thank you for the feedback.  I wasn't aware of the JIT
>>>>>>> > >> > >>> implications.   At
>>>>>>> > >> > >>> >>> least on the benchmark run they don't seem to have an
>>>>>>> impact.
>>>>>>> > >> > >>> >>>
>>>>>>> > >> > >>> >>> If there are other benchmarks that people have that
>>>>>>> can
>>>>>>> > >> validate if
>>>>>>> > >> > >>> this
>>>>>>> > >> > >>> >>> change will be problematic I would appreciate trying
>>>>>>> to run them
>>>>>>> > >> > >>> with the
>>>>>>> > >> > >>> >>> PR.  I will try to run the ones for zeroing/popcnt
>>>>>>> tonight to
>>>>>>> > >> see
>>>>>>> > >> > if
>>>>>>> > >> > >>> >>> there
>>>>>>> > >> > >>> >>> is a change in those.
>>>>>>> > >> > >>> >>>
>>>>>>> > >> > >>> >>> -Micah
>>>>>>> > >> > >>> >>>
>>>>>>> > >> > >>> >>>
>>>>>>> > >> > >>> >>>
>>>>>>> > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz
>>>>>>> Jaureguizar <
>>>>>>> > >> > >>> >>> golthir...@gmail.com> wrote:
>>>>>>> > >> > >>> >>>
>>>>>>> > >> > >>> >>> > I would recommend to take care with this kind of
>>>>>>> changes.
>>>>>>> > >> > >>> >>> >
>>>>>>> > >> > >>> >>> > I didn't try Arrow in more than one year, but by
>>>>>>> then the
>>>>>>> > >> > >>> performance
>>>>>>> > >> > >>> >>> was
>>>>>>> > >> > >>> >>> > quite bad in comparison with plain byte buffer
>>>>>>> access
>>>>>>> > >> > >>> >>> > (see
>>>>>>> http://git.net/apache-arrow-development/msg02353.html *)
>>>>>>> > >> > and
>>>>>>> > >> > >>> >>> > there are several optimizations that the JVM
>>>>>>> (specifically,
>>>>>>> > >> C2)
>>>>>>> > >> > >>> does
>>>>>>> > >> > >>> >>> not
>>>>>>> > >> > >>> >>> > apply when dealing with int instead of longs. One
>>>>>>> of the
>>>>>>> > >> > >>> >>> > most commons is the loop unrolling and
>>>>>>> vectorization.
>>>>>>> > >> > >>> >>> >
>>>>>>> > >> > >>> >>> > * It doesn't seem the best way to reference an old
>>>>>>> email on
>>>>>>> > >> the
>>>>>>> > >> > >>> list,
>>>>>>> > >> > >>> >>> but
>>>>>>> > >> > >>> >>> > it is the only result shown by Google
>>>>>>> > >> > >>> >>> >
>>>>>>> > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<
>>>>>>> > >> > liya.fa...@gmail.com
>>>>>>> > >> > >>> >)
>>>>>>> > >> > >>> >>> > escribió:
>>>>>>> > >> > >>> >>> >
>>>>>>> > >> > >>> >>> >> Hi Micah,
>>>>>>> > >> > >>> >>> >>
>>>>>>> > >> > >>> >>> >> Thanks for your effort. The performance result
>>>>>>> looks good.
>>>>>>> > >> > >>> >>> >>
>>>>>>> > >> > >>> >>> >> As you indicated, ArrowBuf will take additional 12
>>>>>>> bytes (4
>>>>>>> > >> > bytes
>>>>>>> > >> > >>> for
>>>>>>> > >> > >>> >>> each
>>>>>>> > >> > >>> >>> >> of length, write index, and read index).
>>>>>>> > >> > >>> >>> >> Similar overheads also exist for vectors like
>>>>>>> > >> > >>> BaseFixedWidthVector,
>>>>>>> > >> > >>> >>> >> BaseVariableWidthVector, etc.
>>>>>>> > >> > >>> >>> >>
>>>>>>> > >> > >>> >>> >> IMO, such overheads are small enough to justify
>>>>>>> the change.
>>>>>>> > >> > >>> >>> >> Let's check if there are other overheads.
>>>>>>> > >> > >>> >>> >>
>>>>>>> > >> > >>> >>> >> Best,
>>>>>>> > >> > >>> >>> >> Liya Fan
>>>>>>> > >> > >>> >>> >>
>>>>>>> > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield <
>>>>>>> > >> > >>> emkornfi...@gmail.com
>>>>>>> > >> > >>> >>> >
>>>>>>> > >> > >>> >>> >> wrote:
>>>>>>> > >> > >>> >>> >>
>>>>>>> > >> > >>> >>> >> > Hi Liya Fan,
>>>>>>> > >> > >>> >>> >> > Based on the Float8Benchmark there does not seem
>>>>>>> to be any
>>>>>>> > >> > >>> >>> meaningful
>>>>>>> > >> > >>> >>> >> > performance difference on my machine.  At least
>>>>>>> for me, the
>>>>>>> > >> > >>> >>> benchmarks
>>>>>>> > >> > >>> >>> >> are
>>>>>>> > >> > >>> >>> >> > not stable enough to say one is faster than the
>>>>>>> other (I've
>>>>>>> > >> > >>> pasted
>>>>>>> > >> > >>> >>> >> results
>>>>>>> > >> > >>> >>> >> > below).  That being said my machine isn't
>>>>>>> necessarily the
>>>>>>> > >> most
>>>>>>> > >> > >>> >>> reliable
>>>>>>> > >> > >>> >>> >> for
>>>>>>> > >> > >>> >>> >> > benchmarking.
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> > On an intuitive level, this makes sense to me,
>>>>>>> for the
>>>>>>> > >> most
>>>>>>> > >> > >>> part it
>>>>>>> > >> > >>> >>> >> seems
>>>>>>> > >> > >>> >>> >> > like the change just moves casting from "int" to
>>>>>>> "long"
>>>>>>> > >> > further
>>>>>>> > >> > >>> up
>>>>>>> > >> > >>> >>> the
>>>>>>> > >> > >>> >>> >> > stack  for  "PlatformDepdendent" operations.  If
>>>>>>> there are
>>>>>>> > >> > other
>>>>>>> > >> > >>> >>> >> benchmarks
>>>>>>> > >> > >>> >>> >> > that you think are worth running let me know.
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> > One downside performance wise I think for his
>>>>>>> change is it
>>>>>>> > >> > >>> >>> increases the
>>>>>>> > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose could
>>>>>>> influence
>>>>>>> > >> > cache
>>>>>>> > >> > >>> >>> misses
>>>>>>> > >> > >>> >>> >> at
>>>>>>> > >> > >>> >>> >> > some level or increase the size of call-stacks,
>>>>>>> but this
>>>>>>> > >> > doesn't
>>>>>>> > >> > >>> >>> seem to
>>>>>>> > >> > >>> >>> >> > show up in the benchmark..
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> > Thanks,
>>>>>>> > >> > >>> >>> >> > Micah
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> > Sample benchmark numbers:
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> > [New Code]
>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt
>>>>>>>  Score
>>>>>>> > >>  Error
>>>>>>> > >> > >>> >>> Units
>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5
>>>>>>> 15.441 ±
>>>>>>> > >> 0.469
>>>>>>> > >> > >>> >>> us/op
>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5
>>>>>>> 14.057 ±
>>>>>>> > >> 0.115
>>>>>>> > >> > >>> >>> us/op
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> > [Old code]
>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt
>>>>>>>  Score
>>>>>>> > >>  Error
>>>>>>> > >> > >>> >>> Units
>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5
>>>>>>> 16.248 ±
>>>>>>> > >> 1.409
>>>>>>> > >> > >>> >>> us/op
>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5
>>>>>>> 14.150 ±
>>>>>>> > >> 0.084
>>>>>>> > >> > >>> >>> us/op
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <
>>>>>>> > >> liya.fa...@gmail.com
>>>>>>> > >> > >
>>>>>>> > >> > >>> >>> wrote:
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> >> Hi Micah,
>>>>>>> > >> > >>> >>> >> >>
>>>>>>> > >> > >>> >>> >> >> Thanks a lot for doing this.
>>>>>>> > >> > >>> >>> >> >>
>>>>>>> > >> > >>> >>> >> >> I am a little concerned about if there is any
>>>>>>> negative
>>>>>>> > >> > >>> performance
>>>>>>> > >> > >>> >>> >> impact
>>>>>>> > >> > >>> >>> >> >> on the current 32-bit-length based applications.
>>>>>>> > >> > >>> >>> >> >> Can we do some performance comparison on our
>>>>>>> existing
>>>>>>> > >> > >>> benchmarks?
>>>>>>> > >> > >>> >>> >> >>
>>>>>>> > >> > >>> >>> >> >> Best,
>>>>>>> > >> > >>> >>> >> >> Liya Fan
>>>>>>> > >> > >>> >>> >> >>
>>>>>>> > >> > >>> >>> >> >>
>>>>>>> > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield <
>>>>>>> > >> > >>> >>> emkornfi...@gmail.com>
>>>>>>> > >> > >>> >>> >> >> wrote:
>>>>>>> > >> > >>> >>> >> >>
>>>>>>> > >> > >>> >>> >> >>> There have been some previous discussions on
>>>>>>> the mailing
>>>>>>> > >> > about
>>>>>>> > >> > >>> >>> >> supporting
>>>>>>> > >> > >>> >>> >> >>> 64-bit lengths for  Java ValueVectors (this is
>>>>>>> what the
>>>>>>> > >> IPC
>>>>>>> > >> > >>> >>> >> specification
>>>>>>> > >> > >>> >>> >> >>> and C++ support).  I created a PR [1] that
>>>>>>> changes all
>>>>>>> > >> APIs
>>>>>>> > >> > >>> that I
>>>>>>> > >> > >>> >>> >> could
>>>>>>> > >> > >>> >>> >> >>> find that take an index to take an "long"
>>>>>>> instead of an
>>>>>>> > >> > "int"
>>>>>>> > >> > >>> (and
>>>>>>> > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs).
>>>>>>> > >> > >>> >>> >> >>>
>>>>>>> > >> > >>> >>> >> >>> It is a big change, so I think it is worth
>>>>>>> discussing if
>>>>>>> > >> it
>>>>>>> > >> > is
>>>>>>> > >> > >>> >>> >> something
>>>>>>> > >> > >>> >>> >> >>> we
>>>>>>> > >> > >>> >>> >> >>> still want to move forward with.  It would be
>>>>>>> nice to
>>>>>>> > >> come
>>>>>>> > >> > to
>>>>>>> > >> > >>> a
>>>>>>> > >> > >>> >>> >> >>> conclusion
>>>>>>> > >> > >>> >>> >> >>> quickly, ideally in the next few days, to
>>>>>>> avoid a lot of
>>>>>>> > >> > merge
>>>>>>> > >> > >>> >>> >> conflicts.
>>>>>>> > >> > >>> >>> >> >>>
>>>>>>> > >> > >>> >>> >> >>> The reason I did this work now is the C++
>>>>>>> implementation
>>>>>>> > >> has
>>>>>>> > >> > >>> added
>>>>>>> > >> > >>> >>> >> >>> support
>>>>>>> > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString
>>>>>>> arrays and
>>>>>>> > >> based
>>>>>>> > >> > on
>>>>>>> > >> > >>> >>> prior
>>>>>>> > >> > >>> >>> >> >>> discussions we need to have similar support in
>>>>>>> Java
>>>>>>> > >> before
>>>>>>> > >> > our
>>>>>>> > >> > >>> >>> next
>>>>>>> > >> > >>> >>> >> >>> release. Support 64-bit indexes means we can
>>>>>>> have full
>>>>>>> > >> > >>> >>> compatibility
>>>>>>> > >> > >>> >>> >> and
>>>>>>> > >> > >>> >>> >> >>> make the most use of the types in Java.
>>>>>>> > >> > >>> >>> >> >>>
>>>>>>> > >> > >>> >>> >> >>> Look forward to hearing feedback.
>>>>>>> > >> > >>> >>> >> >>>
>>>>>>> > >> > >>> >>> >> >>> Thanks,
>>>>>>> > >> > >>> >>> >> >>> Micah
>>>>>>> > >> > >>> >>> >> >>>
>>>>>>> > >> > >>> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020
>>>>>>> > >> > >>> >>> >> >>>
>>>>>>> > >> > >>> >>> >> >>
>>>>>>> > >> > >>> >>> >>
>>>>>>> > >> > >>> >>> >
>>>>>>> > >> > >>> >>>
>>>>>>> > >> > >>> >>
>>>>>>> > >> > >>>
>>>>>>> > >> > >>
>>>>>>> > >> >
>>>>>>> > >>
>>>>>>> > >
>>>>>>>
>>>>>>

Re: [Discuss][Java] 64-bit lengths for ValueVectors

Reply via email to