On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield <emkornfi...@gmail.com> wrote:

> The vector indexes being limited to 32 bits doesn't limit the addressing
>> to 32 bit chunks of memory. For example, you're prime example before was
>> image data. Having 2 billion images of 1mb images would still be supported
>> without changing the index addressing.
>
> This might be pre-coffee math, but I think we are limited to approximately
> 2000 images because an ArrowBuf only holds up-to 2 billion bytes [1].
> While we have plenty of room for the offsets, we quickly run out of
> contiguous data storage space. For LargeString and LargeBinary this could
> be fixed by changing ArrowBuf.
>
> LargeArray faces the same problem only it applies to its child vectors.
> Supporting LargeArray properly is really what drove the large-scale
> interface change.
>

My expressed concern about these changes was specifically about the use of
long for get/set in the vector interfaces. I'm not saying that we constrain
memory/ArrowBufs to 32bits.


> Rebase would help if possible.
>
> I'll try to get to this in the next few days.
>
> [1]
> https://github.com/apache/arrow/blob/95175fe7cb8439eebe6d2f6e0495f551d6864380/java/memory/src/main/java/io/netty/buffer/ArrowBuf.java#L164
>
> On Fri, Aug 23, 2019 at 4:55 AM Jacques Nadeau <jacq...@apache.org> wrote:
>
>>
>>
>> On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield <emkornfi...@gmail.com>
>> wrote:
>>
>>> I don't think we should couple this discussion with the implementation
>>>> of large list, etc since I think those two concepts are independent.
>>>
>>> I'm still trying to balance in my mind which is a worse experience for
>>> consumers of the libraries for these types.  Claiming that Java supports
>>> these types but throwing an exception when the Vectors exceed 32-bits or
>>> just say they aren't supported until we have 64-bit support in Java.
>>>
>>
>> The vector indexes being limited to 32 bits doesn't limit the addressing
>> to 32 bit chunks of memory. For example, you're prime example before was
>> image data. Having 2 billion images of 1mb images would still be supported
>> without changing the index addressing.
>>
>>
>>
>>>
>>>> I've asked some others on my team their opinions on the risk here. I
>>>> think we should probably review some our more complex vector interactions
>>>> and see how the jvm's assembly changes with this kind of change. Using
>>>> microbenchmarking is good but I think we also need to see whether we're
>>>> constantly inserting additional instructions or if in most cases, this
>>>> actually doesn't impact instruction count.
>>>
>>>
>>> Is this something that your team will take on?
>>>
>>
>>
>> Yeah, we need to look at this I think.
>>
>> Do you need a rebased version of the PR or is the existing one sufficient?
>>>
>>
>> Rebase would help if possible.
>>
>>
>>
>>> Thanks,
>>> Micah
>>>
>>> On Thu, Aug 22, 2019 at 8:56 PM Jacques Nadeau <jacq...@apache.org>
>>> wrote:
>>>
>>>> I don't think we should couple this discussion with the implementation
>>>> of large list, etc since I think those two concepts are independent.
>>>>
>>>> I've asked some others on my team their opinions on the risk here. I
>>>> think we should probably review some our more complex vector interactions
>>>> and see how the jvm's assembly changes with this kind of change. Using
>>>> microbenchmarking is good but I think we also need to see whether we're
>>>> constantly inserting additional instructions or if in most cases, this
>>>> actually doesn't impact instruction count.
>>>>
>>>>
>>>>
>>>> On Wed, Aug 21, 2019 at 12:18 PM Micah Kornfield <emkornfi...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>> With regards to the reference implementation point. It is a good
>>>>>> point. I'm on vacation this week. Unless you're pushing hard on this, can
>>>>>> we pick this up and discuss more next week?
>>>>>
>>>>>
>>>>> Hi Jacques, I hope you had a good rest.  Any more thoughts on the
>>>>> reference implementation aspect of this?
>>>>>
>>>>>
>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>>>> would be best to decouple this discussion from the release timeline
>>>>>> given how many people we have relying on regular releases coming out.
>>>>>> We can keep continue making major 0.x releases until we're ready to
>>>>>> release 1.0.0.
>>>>>
>>>>>
>>>>> I'm OK with it as long as other stakeholders are. Timed releases are
>>>>> the way to go.  As stated on the release thread [1] we need a better
>>>>> mechanism to avoid this type of issue arising again.  The release thread
>>>>> also had some more discussion on compatibility.
>>>>>
>>>>> Thanks,
>>>>> Micah
>>>>>
>>>>> [1]
>>>>> https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E
>>>>>
>>>>>
>>>>> On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <wesmck...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield <
>>>>>> emkornfi...@gmail.com> wrote:
>>>>>> >
>>>>>> > Hi Wes and Jacques,
>>>>>> > See responses below.
>>>>>> >
>>>>>> > With regards to the reference implementation point. It is a good
>>>>>> point. I'm
>>>>>> > > on vacation this week. Unless you're pushing hard on this, can we
>>>>>> pick this
>>>>>> > > up and discuss more next week?
>>>>>> >
>>>>>> >
>>>>>> > Sure thing, enjoy your vacation.  I think the only practical
>>>>>> implications
>>>>>> > are it delays choices around implementing LargeList, LargeBinary,
>>>>>> > LargeString in Java, which in turn might push out the 0.15.0
>>>>>> release.
>>>>>> >
>>>>>>
>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>>>> would be best to decouple this discussion from the release timeline
>>>>>> given how many people we have relying on regular releases coming out.
>>>>>> We can keep continue making major 0.x releases until we're ready to
>>>>>> release 1.0.0.
>>>>>>
>>>>>> > My stance on this is that I don't know how important it is for Java
>>>>>> to
>>>>>> > > support vectors over INT32_MAX elements. The use cases enabled by
>>>>>> > > having very large arrays seem to be concentrated in the native
>>>>>> code
>>>>>> > > world (e.g. C/C++/Rust) -- that could just be
>>>>>> implementation-centrism
>>>>>> > > on my part, though.
>>>>>> >
>>>>>> >
>>>>>> > A data point against this view is Spark has done work to eliminate
>>>>>> 2GB
>>>>>> > memory limits on its block sizes [1].  I don't claim to understand
>>>>>> the
>>>>>> > implications of this. Bryan might you have any thoughts here?  I'm
>>>>>> OK with
>>>>>> > INT32_MAX, as well, I think we should think about what this means
>>>>>> for
>>>>>> > adding Large types to Java and implications for reference
>>>>>> implementations.
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Micah
>>>>>> >
>>>>>> > [1] https://issues.apache.org/jira/browse/SPARK-6235
>>>>>> >
>>>>>> > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau <jacq...@apache.org>
>>>>>> wrote:
>>>>>> >
>>>>>> > > Hey Micah,
>>>>>> > >
>>>>>> > > Appreciate the offer on the compiling. The reality is I'm more
>>>>>> concerned
>>>>>> > > about the unknowns than the compiling issue itself. Any time
>>>>>> you've been
>>>>>> > > tuning for a while, changing something like this could be totally
>>>>>> fine or
>>>>>> > > cause a couple of major issues. For example, we've done a very
>>>>>> large amount
>>>>>> > > of work reducing heap memory footprint of the vectors. Are target
>>>>>> is to
>>>>>> > > actually get it down to 24 bytes per ArrowBuf and 24 bytes heap
>>>>>> per vector
>>>>>> > > (not including arrow bufs).
>>>>>> > >
>>>>>> > > With regards to the reference implementation point. It is a good
>>>>>> point.
>>>>>> > > I'm on vacation this week. Unless you're pushing hard on this,
>>>>>> can we pick
>>>>>> > > this up and discuss more next week?
>>>>>> > >
>>>>>> > > thanks,
>>>>>> > > Jacques
>>>>>> > >
>>>>>> > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield <
>>>>>> emkornfi...@gmail.com>
>>>>>> > > wrote:
>>>>>> > >
>>>>>> > >> Hi Jacques,
>>>>>> > >> I definitely understand these concerns and this change is risky
>>>>>> because it
>>>>>> > >> is so large.  Perhaps, creating a new hierarchy, might be the
>>>>>> cleanest way
>>>>>> > >> of dealing with this.  This could have other benefits like
>>>>>> cleaning up
>>>>>> > >> some
>>>>>> > >> cruft around dictionary encode and "orphaned" method.   Per past
>>>>>> e-mail
>>>>>> > >> threads I agree it is beneficial to have 2 separate reference
>>>>>> > >> implementations that can communicate fully, and my intent here
>>>>>> was to
>>>>>> > >> close
>>>>>> > >> that gap.
>>>>>> > >>
>>>>>> > >> Trying to
>>>>>> > >> > determine the ramifications of these changes would be
>>>>>> challenging and
>>>>>> > >> time
>>>>>> > >> > consuming against all the different ways we interact with the
>>>>>> Arrow Java
>>>>>> > >> > library.
>>>>>> > >>
>>>>>> > >>
>>>>>> > >> Understood.  I took a quick look at Dremio-OSS it seems like it
>>>>>> has a
>>>>>> > >> simple java build system?  If it is helpful, I can try to get a
>>>>>> fork
>>>>>> > >> running that at least compiles against this PR.  My plan would
>>>>>> be to cast
>>>>>> > >> any place that was changed to return a long back to an int, so
>>>>>> in essence
>>>>>> > >> the Dremio algorithms would reman 32-bit implementations.
>>>>>> > >>
>>>>>> > >> I don't  have the infrastructure to test this change properly
>>>>>> from a
>>>>>> > >> distributed systems perspective, so it would still take some
>>>>>> time from
>>>>>> > >> Dremio to validate for regressions.
>>>>>> > >>
>>>>>> > >> I'm not saying I'm against this but want to make sure we've
>>>>>> > >> > explored all less disruptive options before considering
>>>>>> changing
>>>>>> > >> something
>>>>>> > >> > this fundamental (especially when I generally hold the view
>>>>>> that large
>>>>>> > >> cell
>>>>>> > >> > counts against massive contiguous memory is an anti pattern to
>>>>>> scalable
>>>>>> > >> > analytical processing--purely subjective of course).
>>>>>> > >>
>>>>>> > >>
>>>>>> > >> I'm open to other ideas here, as well. I don't think it is out
>>>>>> of the
>>>>>> > >> question to leave the Java implementation as 32-bit, but if we
>>>>>> do, then I
>>>>>> > >> think we should consider a different strategy for reference
>>>>>> > >> implementations.
>>>>>> > >>
>>>>>> > >> Thanks,
>>>>>> > >> Micah
>>>>>> > >>
>>>>>> > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau <
>>>>>> jacq...@apache.org>
>>>>>> > >> wrote:
>>>>>> > >>
>>>>>> > >> > Hey Micah, I didn't have a particular path in mind. Was
>>>>>> thinking more
>>>>>> > >> along
>>>>>> > >> > the lines of extra methods as opposed to separate classes.
>>>>>> > >> >
>>>>>> > >> > Arrow hasn't historically been a place where we're writing
>>>>>> algorithms in
>>>>>> > >> > Java so the fact that they aren't there doesn't mean they
>>>>>> don't exist.
>>>>>> > >> We
>>>>>> > >> > have a large amount of code that depends on the current
>>>>>> behavior that is
>>>>>> > >> > deployed in hundreds of customer clusters (you can peruse our
>>>>>> dremio
>>>>>> > >> repo
>>>>>> > >> > to see how extensively we leverage Arrow if interested).
>>>>>> Trying to
>>>>>> > >> > determine the ramifications of these changes would be
>>>>>> challenging and
>>>>>> > >> time
>>>>>> > >> > consuming against all the different ways we interact with the
>>>>>> Arrow Java
>>>>>> > >> > library. I'm not saying I'm against this but want to make sure
>>>>>> we've
>>>>>> > >> > explored all less disruptive options before considering
>>>>>> changing
>>>>>> > >> something
>>>>>> > >> > this fundamental (especially when I generally hold the view
>>>>>> that large
>>>>>> > >> cell
>>>>>> > >> > counts against massive contiguous memory is an anti pattern to
>>>>>> scalable
>>>>>> > >> > analytical processing--purely subjective of course).
>>>>>> > >> >
>>>>>> > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield <
>>>>>> emkornfi...@gmail.com>
>>>>>> > >> > wrote:
>>>>>> > >> >
>>>>>> > >> > > Hi Jacques,
>>>>>> > >> > > What avenue were you thinking for supporting both paths?   I
>>>>>> didn't
>>>>>> > >> want
>>>>>> > >> > > to pursue a different class hierarchy, because I felt like
>>>>>> that would
>>>>>> > >> > > effectively fork the code base, but that is potentially an
>>>>>> option that
>>>>>> > >> > > would allow us to have a complete reference implementation
>>>>>> in Java
>>>>>> > >> that
>>>>>> > >> > can
>>>>>> > >> > > fully interact with C++, without major changes to this code.
>>>>>> > >> > >
>>>>>> > >> > > For supporting both APIs on the same classes/interfaces, I
>>>>>> think they
>>>>>> > >> > > roughly fall into three categories, changes to input
>>>>>> parameters,
>>>>>> > >> changes
>>>>>> > >> > to
>>>>>> > >> > > output parameters and algorithm changes.
>>>>>> > >> > >
>>>>>> > >> > > For inputs, changing from int to long is essentially a no-op
>>>>>> from the
>>>>>> > >> > > compiler perspective.  From the limited micro-benchmarking
>>>>>> this also
>>>>>> > >> > > doesn't seem to have a performance impact.  So we could keep
>>>>>> two
>>>>>> > >> versions
>>>>>> > >> > > of the methods that only differ on inputs, but it is not
>>>>>> clear what
>>>>>> > >> the
>>>>>> > >> > > value of that would be.
>>>>>> > >> > >
>>>>>> > >> > > For outputs, we can't support methods "long getLength()" and
>>>>>> "int
>>>>>> > >> > > getLength()" in the same class, so we would be forced into
>>>>>> something
>>>>>> > >> like
>>>>>> > >> > > "long getLength(boolean dummy)" which I think is a less
>>>>>> desirable.
>>>>>> > >> > >
>>>>>> > >> > > For algorithm changes, there did not appear to be too many
>>>>>> places
>>>>>> > >> where
>>>>>> > >> > we
>>>>>> > >> > > actually loop over all elements (it is quite possible I
>>>>>> missed
>>>>>> > >> something
>>>>>> > >> > > here), the ones that I did find I was able to mitigate
>>>>>> performance
>>>>>> > >> > > penalties as noted above.  Some of the current
>>>>>> implementation will
>>>>>> > >> get a
>>>>>> > >> > > lot slower for "large arrays", but we can likely fix those
>>>>>> later or in
>>>>>> > >> > this
>>>>>> > >> > > PR with a nested while loop instead of 2 for loops.
>>>>>> > >> > >
>>>>>> > >> > > Thanks,
>>>>>> > >> > > Micah
>>>>>> > >> > >
>>>>>> > >> > >
>>>>>> > >> > > On Saturday, August 10, 2019, Jacques Nadeau <
>>>>>> jacq...@apache.org>
>>>>>> > >> wrote:
>>>>>> > >> > >
>>>>>> > >> > >> This is a pretty massive change to the apis. I wonder how
>>>>>> nasty it
>>>>>> > >> would
>>>>>> > >> > >> be to just support both paths. Have you evaluated how
>>>>>> complex that
>>>>>> > >> > would be?
>>>>>> > >> > >>
>>>>>> > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield <
>>>>>> > >> emkornfi...@gmail.com>
>>>>>> > >> > >> wrote:
>>>>>> > >> > >>
>>>>>> > >> > >>> After more investigation, it looks like Float8Benchmarks
>>>>>> at least
>>>>>> > >> on my
>>>>>> > >> > >>> machine are within the range of noise.
>>>>>> > >> > >>>
>>>>>> > >> > >>> For BitVectorHelper I pushed a new commit [1], seems to
>>>>>> bring the
>>>>>> > >> > >>> BitVectorHelper benchmarks back inline (and even with some
>>>>>> > >> improvement
>>>>>> > >> > >>> for
>>>>>> > >> > >>> getNullCountBenchmark).
>>>>>> > >> > >>>
>>>>>> > >> > >>> Benchmark                                        Mode
>>>>>> Cnt   Score
>>>>>> > >> > >>>  Error
>>>>>> > >> > >>>  Units
>>>>>> > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>> 5   3.821 ±
>>>>>> > >> > >>> 0.031
>>>>>> > >> > >>>  ns/op
>>>>>> > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>> 5  14.884 ±
>>>>>> > >> > >>> 0.141
>>>>>> > >> > >>>  ns/op
>>>>>> > >> > >>>
>>>>>> > >> > >>> I applied the same pattern to other loops that I could
>>>>>> find, and for
>>>>>> > >> > any
>>>>>> > >> > >>> "for (long" loop on the critical path, I broke it up into
>>>>>> two loops.
>>>>>> > >> > the
>>>>>> > >> > >>> first loop does iteration by integer, the second finishes
>>>>>> off for
>>>>>> > >> any
>>>>>> > >> > >>> long
>>>>>> > >> > >>> values.  As a side note it seems like optimization for
>>>>>> loops using
>>>>>> > >> long
>>>>>> > >> > >>> counters at least have a semi-recent open bug for the JVM
>>>>>> [2]
>>>>>> > >> > >>>
>>>>>> > >> > >>> Thanks,
>>>>>> > >> > >>> Micah
>>>>>> > >> > >>>
>>>>>> > >> > >>> [1]
>>>>>> > >> > >>>
>>>>>> > >> > >>>
>>>>>> > >> >
>>>>>> > >>
>>>>>> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797
>>>>>> > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051
>>>>>> > >> > >>>
>>>>>> > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <
>>>>>> > >> emkornfi...@gmail.com>
>>>>>> > >> > >>> wrote:
>>>>>> > >> > >>>
>>>>>> > >> > >>> > Indeed, the BoundChecking and CheckNullForGet variables
>>>>>> can make a
>>>>>> > >> > big
>>>>>> > >> > >>> > difference.  I didn't initially run the benchmarks with
>>>>>> these
>>>>>> > >> turned
>>>>>> > >> > on
>>>>>> > >> > >>> > (you can see the result from above with
>>>>>> Float8Benchmarks).  Here
>>>>>> > >> are
>>>>>> > >> > >>> new
>>>>>> > >> > >>> > numbers including with the flags enabled.  It looks like
>>>>>> using
>>>>>> > >> longs
>>>>>> > >> > >>> might
>>>>>> > >> > >>> > be a little bit slower, I'll see what I can do to
>>>>>> mitigate this.
>>>>>> > >> > >>> >
>>>>>> > >> > >>> > Ravindra also volunteered to try to benchmark the
>>>>>> changes with
>>>>>> > >> > Dremio's
>>>>>> > >> > >>> > code on today's sync call.
>>>>>> > >> > >>> >
>>>>>> > >> > >>> > New
>>>>>> > >> > >>> >
>>>>>> > >> > >>> > Benchmark                                        Mode
>>>>>> Cnt   Score
>>>>>> > >> > >>>  Error
>>>>>> > >> > >>> > Units
>>>>>> > >> > >>> >
>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>> 5
>>>>>> > >>  4.176 ±
>>>>>> > >> > >>> 1.292
>>>>>> > >> > >>> > ns/op
>>>>>> > >> > >>> >
>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>> 5
>>>>>> > >> 26.102 ±
>>>>>> > >> > >>> 0.700
>>>>>> > >> > >>> > ns/op
>>>>>> > >> > >>> >
>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  7.398 ±
>>>>>> 0.084
>>>>>> > >> us/op
>>>>>> > >> > >>> >
>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.711 ±
>>>>>> 0.057
>>>>>> > >> us/op
>>>>>> > >> > >>> >
>>>>>> > >> > >>> >
>>>>>> > >> > >>> >
>>>>>> > >> > >>> > old
>>>>>> > >> > >>> >
>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>> 5
>>>>>> > >>  3.828 ±
>>>>>> > >> > >>> 0.030
>>>>>> > >> > >>> > ns/op
>>>>>> > >> > >>> >
>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>> 5
>>>>>> > >> 20.611 ±
>>>>>> > >> > >>> 0.188
>>>>>> > >> > >>> > ns/op
>>>>>> > >> > >>> >
>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  6.597 ±
>>>>>> 0.462
>>>>>> > >> us/op
>>>>>> > >> > >>> >
>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.615 ±
>>>>>> 0.027
>>>>>> > >> us/op
>>>>>> > >> > >>> >
>>>>>> > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <
>>>>>> liya.fa...@gmail.com>
>>>>>> > >> > wrote:
>>>>>> > >> > >>> >
>>>>>> > >> > >>> >> Hi Gonzalo,
>>>>>> > >> > >>> >>
>>>>>> > >> > >>> >> Thanks for sharing the performance results.
>>>>>> > >> > >>> >> I am wondering if you have turned off the flag
>>>>>> > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED.
>>>>>> > >> > >>> >> If not, the lower throughput should be expected.
>>>>>> > >> > >>> >>
>>>>>> > >> > >>> >> Best,
>>>>>> > >> > >>> >> Liya Fan
>>>>>> > >> > >>> >>
>>>>>> > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <
>>>>>> > >> > >>> emkornfi...@gmail.com>
>>>>>> > >> > >>> >> wrote:
>>>>>> > >> > >>> >>
>>>>>> > >> > >>> >>> Hi Gonzalo,
>>>>>> > >> > >>> >>> Thank you for the feedback.  I wasn't aware of the JIT
>>>>>> > >> > >>> implications.   At
>>>>>> > >> > >>> >>> least on the benchmark run they don't seem to have an
>>>>>> impact.
>>>>>> > >> > >>> >>>
>>>>>> > >> > >>> >>> If there are other benchmarks that people have that can
>>>>>> > >> validate if
>>>>>> > >> > >>> this
>>>>>> > >> > >>> >>> change will be problematic I would appreciate trying
>>>>>> to run them
>>>>>> > >> > >>> with the
>>>>>> > >> > >>> >>> PR.  I will try to run the ones for zeroing/popcnt
>>>>>> tonight to
>>>>>> > >> see
>>>>>> > >> > if
>>>>>> > >> > >>> >>> there
>>>>>> > >> > >>> >>> is a change in those.
>>>>>> > >> > >>> >>>
>>>>>> > >> > >>> >>> -Micah
>>>>>> > >> > >>> >>>
>>>>>> > >> > >>> >>>
>>>>>> > >> > >>> >>>
>>>>>> > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz
>>>>>> Jaureguizar <
>>>>>> > >> > >>> >>> golthir...@gmail.com> wrote:
>>>>>> > >> > >>> >>>
>>>>>> > >> > >>> >>> > I would recommend to take care with this kind of
>>>>>> changes.
>>>>>> > >> > >>> >>> >
>>>>>> > >> > >>> >>> > I didn't try Arrow in more than one year, but by
>>>>>> then the
>>>>>> > >> > >>> performance
>>>>>> > >> > >>> >>> was
>>>>>> > >> > >>> >>> > quite bad in comparison with plain byte buffer access
>>>>>> > >> > >>> >>> > (see
>>>>>> http://git.net/apache-arrow-development/msg02353.html *)
>>>>>> > >> > and
>>>>>> > >> > >>> >>> > there are several optimizations that the JVM
>>>>>> (specifically,
>>>>>> > >> C2)
>>>>>> > >> > >>> does
>>>>>> > >> > >>> >>> not
>>>>>> > >> > >>> >>> > apply when dealing with int instead of longs. One of
>>>>>> the
>>>>>> > >> > >>> >>> > most commons is the loop unrolling and vectorization.
>>>>>> > >> > >>> >>> >
>>>>>> > >> > >>> >>> > * It doesn't seem the best way to reference an old
>>>>>> email on
>>>>>> > >> the
>>>>>> > >> > >>> list,
>>>>>> > >> > >>> >>> but
>>>>>> > >> > >>> >>> > it is the only result shown by Google
>>>>>> > >> > >>> >>> >
>>>>>> > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<
>>>>>> > >> > liya.fa...@gmail.com
>>>>>> > >> > >>> >)
>>>>>> > >> > >>> >>> > escribió:
>>>>>> > >> > >>> >>> >
>>>>>> > >> > >>> >>> >> Hi Micah,
>>>>>> > >> > >>> >>> >>
>>>>>> > >> > >>> >>> >> Thanks for your effort. The performance result
>>>>>> looks good.
>>>>>> > >> > >>> >>> >>
>>>>>> > >> > >>> >>> >> As you indicated, ArrowBuf will take additional 12
>>>>>> bytes (4
>>>>>> > >> > bytes
>>>>>> > >> > >>> for
>>>>>> > >> > >>> >>> each
>>>>>> > >> > >>> >>> >> of length, write index, and read index).
>>>>>> > >> > >>> >>> >> Similar overheads also exist for vectors like
>>>>>> > >> > >>> BaseFixedWidthVector,
>>>>>> > >> > >>> >>> >> BaseVariableWidthVector, etc.
>>>>>> > >> > >>> >>> >>
>>>>>> > >> > >>> >>> >> IMO, such overheads are small enough to justify the
>>>>>> change.
>>>>>> > >> > >>> >>> >> Let's check if there are other overheads.
>>>>>> > >> > >>> >>> >>
>>>>>> > >> > >>> >>> >> Best,
>>>>>> > >> > >>> >>> >> Liya Fan
>>>>>> > >> > >>> >>> >>
>>>>>> > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield <
>>>>>> > >> > >>> emkornfi...@gmail.com
>>>>>> > >> > >>> >>> >
>>>>>> > >> > >>> >>> >> wrote:
>>>>>> > >> > >>> >>> >>
>>>>>> > >> > >>> >>> >> > Hi Liya Fan,
>>>>>> > >> > >>> >>> >> > Based on the Float8Benchmark there does not seem
>>>>>> to be any
>>>>>> > >> > >>> >>> meaningful
>>>>>> > >> > >>> >>> >> > performance difference on my machine.  At least
>>>>>> for me, the
>>>>>> > >> > >>> >>> benchmarks
>>>>>> > >> > >>> >>> >> are
>>>>>> > >> > >>> >>> >> > not stable enough to say one is faster than the
>>>>>> other (I've
>>>>>> > >> > >>> pasted
>>>>>> > >> > >>> >>> >> results
>>>>>> > >> > >>> >>> >> > below).  That being said my machine isn't
>>>>>> necessarily the
>>>>>> > >> most
>>>>>> > >> > >>> >>> reliable
>>>>>> > >> > >>> >>> >> for
>>>>>> > >> > >>> >>> >> > benchmarking.
>>>>>> > >> > >>> >>> >> >
>>>>>> > >> > >>> >>> >> > On an intuitive level, this makes sense to me,
>>>>>> for the
>>>>>> > >> most
>>>>>> > >> > >>> part it
>>>>>> > >> > >>> >>> >> seems
>>>>>> > >> > >>> >>> >> > like the change just moves casting from "int" to
>>>>>> "long"
>>>>>> > >> > further
>>>>>> > >> > >>> up
>>>>>> > >> > >>> >>> the
>>>>>> > >> > >>> >>> >> > stack  for  "PlatformDepdendent" operations.  If
>>>>>> there are
>>>>>> > >> > other
>>>>>> > >> > >>> >>> >> benchmarks
>>>>>> > >> > >>> >>> >> > that you think are worth running let me know.
>>>>>> > >> > >>> >>> >> >
>>>>>> > >> > >>> >>> >> > One downside performance wise I think for his
>>>>>> change is it
>>>>>> > >> > >>> >>> increases the
>>>>>> > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose could
>>>>>> influence
>>>>>> > >> > cache
>>>>>> > >> > >>> >>> misses
>>>>>> > >> > >>> >>> >> at
>>>>>> > >> > >>> >>> >> > some level or increase the size of call-stacks,
>>>>>> but this
>>>>>> > >> > doesn't
>>>>>> > >> > >>> >>> seem to
>>>>>> > >> > >>> >>> >> > show up in the benchmark..
>>>>>> > >> > >>> >>> >> >
>>>>>> > >> > >>> >>> >> > Thanks,
>>>>>> > >> > >>> >>> >> > Micah
>>>>>> > >> > >>> >>> >> >
>>>>>> > >> > >>> >>> >> > Sample benchmark numbers:
>>>>>> > >> > >>> >>> >> >
>>>>>> > >> > >>> >>> >> > [New Code]
>>>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt
>>>>>>  Score
>>>>>> > >>  Error
>>>>>> > >> > >>> >>> Units
>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5
>>>>>> 15.441 ±
>>>>>> > >> 0.469
>>>>>> > >> > >>> >>> us/op
>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5
>>>>>> 14.057 ±
>>>>>> > >> 0.115
>>>>>> > >> > >>> >>> us/op
>>>>>> > >> > >>> >>> >> >
>>>>>> > >> > >>> >>> >> >
>>>>>> > >> > >>> >>> >> > [Old code]
>>>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt
>>>>>>  Score
>>>>>> > >>  Error
>>>>>> > >> > >>> >>> Units
>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5
>>>>>> 16.248 ±
>>>>>> > >> 1.409
>>>>>> > >> > >>> >>> us/op
>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5
>>>>>> 14.150 ±
>>>>>> > >> 0.084
>>>>>> > >> > >>> >>> us/op
>>>>>> > >> > >>> >>> >> >
>>>>>> > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <
>>>>>> > >> liya.fa...@gmail.com
>>>>>> > >> > >
>>>>>> > >> > >>> >>> wrote:
>>>>>> > >> > >>> >>> >> >
>>>>>> > >> > >>> >>> >> >> Hi Micah,
>>>>>> > >> > >>> >>> >> >>
>>>>>> > >> > >>> >>> >> >> Thanks a lot for doing this.
>>>>>> > >> > >>> >>> >> >>
>>>>>> > >> > >>> >>> >> >> I am a little concerned about if there is any
>>>>>> negative
>>>>>> > >> > >>> performance
>>>>>> > >> > >>> >>> >> impact
>>>>>> > >> > >>> >>> >> >> on the current 32-bit-length based applications.
>>>>>> > >> > >>> >>> >> >> Can we do some performance comparison on our
>>>>>> existing
>>>>>> > >> > >>> benchmarks?
>>>>>> > >> > >>> >>> >> >>
>>>>>> > >> > >>> >>> >> >> Best,
>>>>>> > >> > >>> >>> >> >> Liya Fan
>>>>>> > >> > >>> >>> >> >>
>>>>>> > >> > >>> >>> >> >>
>>>>>> > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield <
>>>>>> > >> > >>> >>> emkornfi...@gmail.com>
>>>>>> > >> > >>> >>> >> >> wrote:
>>>>>> > >> > >>> >>> >> >>
>>>>>> > >> > >>> >>> >> >>> There have been some previous discussions on
>>>>>> the mailing
>>>>>> > >> > about
>>>>>> > >> > >>> >>> >> supporting
>>>>>> > >> > >>> >>> >> >>> 64-bit lengths for  Java ValueVectors (this is
>>>>>> what the
>>>>>> > >> IPC
>>>>>> > >> > >>> >>> >> specification
>>>>>> > >> > >>> >>> >> >>> and C++ support).  I created a PR [1] that
>>>>>> changes all
>>>>>> > >> APIs
>>>>>> > >> > >>> that I
>>>>>> > >> > >>> >>> >> could
>>>>>> > >> > >>> >>> >> >>> find that take an index to take an "long"
>>>>>> instead of an
>>>>>> > >> > "int"
>>>>>> > >> > >>> (and
>>>>>> > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs).
>>>>>> > >> > >>> >>> >> >>>
>>>>>> > >> > >>> >>> >> >>> It is a big change, so I think it is worth
>>>>>> discussing if
>>>>>> > >> it
>>>>>> > >> > is
>>>>>> > >> > >>> >>> >> something
>>>>>> > >> > >>> >>> >> >>> we
>>>>>> > >> > >>> >>> >> >>> still want to move forward with.  It would be
>>>>>> nice to
>>>>>> > >> come
>>>>>> > >> > to
>>>>>> > >> > >>> a
>>>>>> > >> > >>> >>> >> >>> conclusion
>>>>>> > >> > >>> >>> >> >>> quickly, ideally in the next few days, to avoid
>>>>>> a lot of
>>>>>> > >> > merge
>>>>>> > >> > >>> >>> >> conflicts.
>>>>>> > >> > >>> >>> >> >>>
>>>>>> > >> > >>> >>> >> >>> The reason I did this work now is the C++
>>>>>> implementation
>>>>>> > >> has
>>>>>> > >> > >>> added
>>>>>> > >> > >>> >>> >> >>> support
>>>>>> > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString
>>>>>> arrays and
>>>>>> > >> based
>>>>>> > >> > on
>>>>>> > >> > >>> >>> prior
>>>>>> > >> > >>> >>> >> >>> discussions we need to have similar support in
>>>>>> Java
>>>>>> > >> before
>>>>>> > >> > our
>>>>>> > >> > >>> >>> next
>>>>>> > >> > >>> >>> >> >>> release. Support 64-bit indexes means we can
>>>>>> have full
>>>>>> > >> > >>> >>> compatibility
>>>>>> > >> > >>> >>> >> and
>>>>>> > >> > >>> >>> >> >>> make the most use of the types in Java.
>>>>>> > >> > >>> >>> >> >>>
>>>>>> > >> > >>> >>> >> >>> Look forward to hearing feedback.
>>>>>> > >> > >>> >>> >> >>>
>>>>>> > >> > >>> >>> >> >>> Thanks,
>>>>>> > >> > >>> >>> >> >>> Micah
>>>>>> > >> > >>> >>> >> >>>
>>>>>> > >> > >>> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020
>>>>>> > >> > >>> >>> >> >>>
>>>>>> > >> > >>> >>> >> >>
>>>>>> > >> > >>> >>> >>
>>>>>> > >> > >>> >>> >
>>>>>> > >> > >>> >>>
>>>>>> > >> > >>> >>
>>>>>> > >> > >>>
>>>>>> > >> > >>
>>>>>> > >> >
>>>>>> > >>
>>>>>> > >
>>>>>>
>>>>>

Reply via email to