I'll also note this isn't quite in final form, I'd still like to add some more unit tests.
On Fri, Nov 22, 2019 at 11:36 AM Wes McKinney <wesmck...@gmail.com> wrote: > hi Micah -- it makes sense to limit the scope for the time being to > permitting LargeString/Binary work to proceed. Jacques, have you had a > chance to look at this? > > On Fri, Nov 15, 2019 at 3:07 AM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > > Apologies for the long delay, I chose to do the minimal work of limiting > this change [1] to allowing ArrowBuf to 64-bit lengths. This would unblock > work on LargeString and LargeBinary. If this change looks OK, I think > there is some follow-up work to add more thorough unit/integration tests. > > > > As an aside, it does seem like the 2GB limit is affecting some users in > Spark [2][3], so hopefully LargeString would help with this. > > > > Allowing more than MAX_INT elements is Vectors/array still a blocker for > making LargeList useful. > > > > Thanks, > > Micah > > > > [1] https://github.com/apache/arrow/pull/5020 > > [2] > https://stackoverflow.com/questions/58739888/spark-is-it-possible-to-increase-pyarrow-buffer#comment103812119_58739888 > > [3] https://issues.apache.org/jira/browse/ARROW-4890 > > > > On Fri, Aug 23, 2019 at 8:33 AM Jacques Nadeau <jacq...@apache.org> > wrote: > >> > >> > >> > >> On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >>>> > >>>> The vector indexes being limited to 32 bits doesn't limit the > addressing to 32 bit chunks of memory. For example, you're prime example > before was image data. Having 2 billion images of 1mb images would still be > supported without changing the index addressing. > >>> > >>> This might be pre-coffee math, but I think we are limited to > approximately 2000 images because an ArrowBuf only holds up-to 2 billion > bytes [1]. While we have plenty of room for the offsets, we quickly run > out of contiguous data storage space. For LargeString and LargeBinary this > could be fixed by changing ArrowBuf. > >>> > >>> LargeArray faces the same problem only it applies to its child > vectors. Supporting LargeArray properly is really what drove the > large-scale interface change. > >> > >> > >> My expressed concern about these changes was specifically about the use > of long for get/set in the vector interfaces. I'm not saying that we > constrain memory/ArrowBufs to 32bits. > >> > >>> > >>>> Rebase would help if possible. > >>> > >>> I'll try to get to this in the next few days. > >>> > >>> [1] > https://github.com/apache/arrow/blob/95175fe7cb8439eebe6d2f6e0495f551d6864380/java/memory/src/main/java/io/netty/buffer/ArrowBuf.java#L164 > >>> > >>> On Fri, Aug 23, 2019 at 4:55 AM Jacques Nadeau <jacq...@apache.org> > wrote: > >>>> > >>>> > >>>> > >>>> On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >>>>>> > >>>>>> I don't think we should couple this discussion with the > implementation of large list, etc since I think those two concepts are > independent. > >>>>> > >>>>> I'm still trying to balance in my mind which is a worse experience > for consumers of the libraries for these types. Claiming that Java > supports these types but throwing an exception when the Vectors exceed > 32-bits or just say they aren't supported until we have 64-bit support in > Java. > >>>> > >>>> > >>>> The vector indexes being limited to 32 bits doesn't limit the > addressing to 32 bit chunks of memory. For example, you're prime example > before was image data. Having 2 billion images of 1mb images would still be > supported without changing the index addressing. > >>>> > >>>> > >>>>> > >>>>>> > >>>>>> I've asked some others on my team their opinions on the risk here. > I think we should probably review some our more complex vector interactions > and see how the jvm's assembly changes with this kind of change. Using > microbenchmarking is good but I think we also need to see whether we're > constantly inserting additional instructions or if in most cases, this > actually doesn't impact instruction count. > >>>>> > >>>>> > >>>>> Is this something that your team will take on? > >>>> > >>>> > >>>> > >>>> Yeah, we need to look at this I think. > >>>> > >>>>> Do you need a rebased version of the PR or is the existing one > sufficient? > >>>> > >>>> > >>>> Rebase would help if possible. > >>>> > >>>> > >>>>> > >>>>> Thanks, > >>>>> Micah > >>>>> > >>>>> On Thu, Aug 22, 2019 at 8:56 PM Jacques Nadeau <jacq...@apache.org> > wrote: > >>>>>> > >>>>>> I don't think we should couple this discussion with the > implementation of large list, etc since I think those two concepts are > independent. > >>>>>> > >>>>>> I've asked some others on my team their opinions on the risk here. > I think we should probably review some our more complex vector interactions > and see how the jvm's assembly changes with this kind of change. Using > microbenchmarking is good but I think we also need to see whether we're > constantly inserting additional instructions or if in most cases, this > actually doesn't impact instruction count. > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Wed, Aug 21, 2019 at 12:18 PM Micah Kornfield < > emkornfi...@gmail.com> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> With regards to the reference implementation point. It is a good > point. I'm on vacation this week. Unless you're pushing hard on this, can > we pick this up and discuss more next week? > >>>>>>> > >>>>>>> > >>>>>>> Hi Jacques, I hope you had a good rest. Any more thoughts on the > reference implementation aspect of this? > >>>>>>> > >>>>>>>> > >>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it > >>>>>>>> would be best to decouple this discussion from the release > timeline > >>>>>>>> given how many people we have relying on regular releases coming > out. > >>>>>>>> We can keep continue making major 0.x releases until we're ready > to > >>>>>>>> release 1.0.0. > >>>>>>> > >>>>>>> > >>>>>>> I'm OK with it as long as other stakeholders are. Timed releases > are the way to go. As stated on the release thread [1] we need a better > mechanism to avoid this type of issue arising again. The release thread > also had some more discussion on compatibility. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Micah > >>>>>>> > >>>>>>> [1] > https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E > >>>>>>> > >>>>>>> > >>>>>>> On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <wesmck...@gmail.com> > wrote: > >>>>>>>> > >>>>>>>> On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield < > emkornfi...@gmail.com> wrote: > >>>>>>>> > > >>>>>>>> > Hi Wes and Jacques, > >>>>>>>> > See responses below. > >>>>>>>> > > >>>>>>>> > With regards to the reference implementation point. It is a > good point. I'm > >>>>>>>> > > on vacation this week. Unless you're pushing hard on this, > can we pick this > >>>>>>>> > > up and discuss more next week? > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > Sure thing, enjoy your vacation. I think the only practical > implications > >>>>>>>> > are it delays choices around implementing LargeList, > LargeBinary, > >>>>>>>> > LargeString in Java, which in turn might push out the 0.15.0 > release. > >>>>>>>> > > >>>>>>>> > >>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it > >>>>>>>> would be best to decouple this discussion from the release > timeline > >>>>>>>> given how many people we have relying on regular releases coming > out. > >>>>>>>> We can keep continue making major 0.x releases until we're ready > to > >>>>>>>> release 1.0.0. > >>>>>>>> > >>>>>>>> > My stance on this is that I don't know how important it is for > Java to > >>>>>>>> > > support vectors over INT32_MAX elements. The use cases > enabled by > >>>>>>>> > > having very large arrays seem to be concentrated in the > native code > >>>>>>>> > > world (e.g. C/C++/Rust) -- that could just be > implementation-centrism > >>>>>>>> > > on my part, though. > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > A data point against this view is Spark has done work to > eliminate 2GB > >>>>>>>> > memory limits on its block sizes [1]. I don't claim to > understand the > >>>>>>>> > implications of this. Bryan might you have any thoughts here? > I'm OK with > >>>>>>>> > INT32_MAX, as well, I think we should think about what this > means for > >>>>>>>> > adding Large types to Java and implications for reference > implementations. > >>>>>>>> > > >>>>>>>> > Thanks, > >>>>>>>> > Micah > >>>>>>>> > > >>>>>>>> > [1] https://issues.apache.org/jira/browse/SPARK-6235 > >>>>>>>> > > >>>>>>>> > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau < > jacq...@apache.org> wrote: > >>>>>>>> > > >>>>>>>> > > Hey Micah, > >>>>>>>> > > > >>>>>>>> > > Appreciate the offer on the compiling. The reality is I'm > more concerned > >>>>>>>> > > about the unknowns than the compiling issue itself. Any time > you've been > >>>>>>>> > > tuning for a while, changing something like this could be > totally fine or > >>>>>>>> > > cause a couple of major issues. For example, we've done a > very large amount > >>>>>>>> > > of work reducing heap memory footprint of the vectors. Are > target is to > >>>>>>>> > > actually get it down to 24 bytes per ArrowBuf and 24 bytes > heap per vector > >>>>>>>> > > (not including arrow bufs). > >>>>>>>> > > > >>>>>>>> > > With regards to the reference implementation point. It is a > good point. > >>>>>>>> > > I'm on vacation this week. Unless you're pushing hard on > this, can we pick > >>>>>>>> > > this up and discuss more next week? > >>>>>>>> > > > >>>>>>>> > > thanks, > >>>>>>>> > > Jacques > >>>>>>>> > > > >>>>>>>> > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield < > emkornfi...@gmail.com> > >>>>>>>> > > wrote: > >>>>>>>> > > > >>>>>>>> > >> Hi Jacques, > >>>>>>>> > >> I definitely understand these concerns and this change is > risky because it > >>>>>>>> > >> is so large. Perhaps, creating a new hierarchy, might be > the cleanest way > >>>>>>>> > >> of dealing with this. This could have other benefits like > cleaning up > >>>>>>>> > >> some > >>>>>>>> > >> cruft around dictionary encode and "orphaned" method. Per > past e-mail > >>>>>>>> > >> threads I agree it is beneficial to have 2 separate reference > >>>>>>>> > >> implementations that can communicate fully, and my intent > here was to > >>>>>>>> > >> close > >>>>>>>> > >> that gap. > >>>>>>>> > >> > >>>>>>>> > >> Trying to > >>>>>>>> > >> > determine the ramifications of these changes would be > challenging and > >>>>>>>> > >> time > >>>>>>>> > >> > consuming against all the different ways we interact with > the Arrow Java > >>>>>>>> > >> > library. > >>>>>>>> > >> > >>>>>>>> > >> > >>>>>>>> > >> Understood. I took a quick look at Dremio-OSS it seems like > it has a > >>>>>>>> > >> simple java build system? If it is helpful, I can try to > get a fork > >>>>>>>> > >> running that at least compiles against this PR. My plan > would be to cast > >>>>>>>> > >> any place that was changed to return a long back to an int, > so in essence > >>>>>>>> > >> the Dremio algorithms would reman 32-bit implementations. > >>>>>>>> > >> > >>>>>>>> > >> I don't have the infrastructure to test this change > properly from a > >>>>>>>> > >> distributed systems perspective, so it would still take some > time from > >>>>>>>> > >> Dremio to validate for regressions. > >>>>>>>> > >> > >>>>>>>> > >> I'm not saying I'm against this but want to make sure we've > >>>>>>>> > >> > explored all less disruptive options before considering > changing > >>>>>>>> > >> something > >>>>>>>> > >> > this fundamental (especially when I generally hold the > view that large > >>>>>>>> > >> cell > >>>>>>>> > >> > counts against massive contiguous memory is an anti > pattern to scalable > >>>>>>>> > >> > analytical processing--purely subjective of course). > >>>>>>>> > >> > >>>>>>>> > >> > >>>>>>>> > >> I'm open to other ideas here, as well. I don't think it is > out of the > >>>>>>>> > >> question to leave the Java implementation as 32-bit, but if > we do, then I > >>>>>>>> > >> think we should consider a different strategy for reference > >>>>>>>> > >> implementations. > >>>>>>>> > >> > >>>>>>>> > >> Thanks, > >>>>>>>> > >> Micah > >>>>>>>> > >> > >>>>>>>> > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau < > jacq...@apache.org> > >>>>>>>> > >> wrote: > >>>>>>>> > >> > >>>>>>>> > >> > Hey Micah, I didn't have a particular path in mind. Was > thinking more > >>>>>>>> > >> along > >>>>>>>> > >> > the lines of extra methods as opposed to separate classes. > >>>>>>>> > >> > > >>>>>>>> > >> > Arrow hasn't historically been a place where we're writing > algorithms in > >>>>>>>> > >> > Java so the fact that they aren't there doesn't mean they > don't exist. > >>>>>>>> > >> We > >>>>>>>> > >> > have a large amount of code that depends on the current > behavior that is > >>>>>>>> > >> > deployed in hundreds of customer clusters (you can peruse > our dremio > >>>>>>>> > >> repo > >>>>>>>> > >> > to see how extensively we leverage Arrow if interested). > Trying to > >>>>>>>> > >> > determine the ramifications of these changes would be > challenging and > >>>>>>>> > >> time > >>>>>>>> > >> > consuming against all the different ways we interact with > the Arrow Java > >>>>>>>> > >> > library. I'm not saying I'm against this but want to make > sure we've > >>>>>>>> > >> > explored all less disruptive options before considering > changing > >>>>>>>> > >> something > >>>>>>>> > >> > this fundamental (especially when I generally hold the > view that large > >>>>>>>> > >> cell > >>>>>>>> > >> > counts against massive contiguous memory is an anti > pattern to scalable > >>>>>>>> > >> > analytical processing--purely subjective of course). > >>>>>>>> > >> > > >>>>>>>> > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield < > emkornfi...@gmail.com> > >>>>>>>> > >> > wrote: > >>>>>>>> > >> > > >>>>>>>> > >> > > Hi Jacques, > >>>>>>>> > >> > > What avenue were you thinking for supporting both > paths? I didn't > >>>>>>>> > >> want > >>>>>>>> > >> > > to pursue a different class hierarchy, because I felt > like that would > >>>>>>>> > >> > > effectively fork the code base, but that is potentially > an option that > >>>>>>>> > >> > > would allow us to have a complete reference > implementation in Java > >>>>>>>> > >> that > >>>>>>>> > >> > can > >>>>>>>> > >> > > fully interact with C++, without major changes to this > code. > >>>>>>>> > >> > > > >>>>>>>> > >> > > For supporting both APIs on the same classes/interfaces, > I think they > >>>>>>>> > >> > > roughly fall into three categories, changes to input > parameters, > >>>>>>>> > >> changes > >>>>>>>> > >> > to > >>>>>>>> > >> > > output parameters and algorithm changes. > >>>>>>>> > >> > > > >>>>>>>> > >> > > For inputs, changing from int to long is essentially a > no-op from the > >>>>>>>> > >> > > compiler perspective. From the limited > micro-benchmarking this also > >>>>>>>> > >> > > doesn't seem to have a performance impact. So we could > keep two > >>>>>>>> > >> versions > >>>>>>>> > >> > > of the methods that only differ on inputs, but it is not > clear what > >>>>>>>> > >> the > >>>>>>>> > >> > > value of that would be. > >>>>>>>> > >> > > > >>>>>>>> > >> > > For outputs, we can't support methods "long getLength()" > and "int > >>>>>>>> > >> > > getLength()" in the same class, so we would be forced > into something > >>>>>>>> > >> like > >>>>>>>> > >> > > "long getLength(boolean dummy)" which I think is a less > desirable. > >>>>>>>> > >> > > > >>>>>>>> > >> > > For algorithm changes, there did not appear to be too > many places > >>>>>>>> > >> where > >>>>>>>> > >> > we > >>>>>>>> > >> > > actually loop over all elements (it is quite possible I > missed > >>>>>>>> > >> something > >>>>>>>> > >> > > here), the ones that I did find I was able to mitigate > performance > >>>>>>>> > >> > > penalties as noted above. Some of the current > implementation will > >>>>>>>> > >> get a > >>>>>>>> > >> > > lot slower for "large arrays", but we can likely fix > those later or in > >>>>>>>> > >> > this > >>>>>>>> > >> > > PR with a nested while loop instead of 2 for loops. > >>>>>>>> > >> > > > >>>>>>>> > >> > > Thanks, > >>>>>>>> > >> > > Micah > >>>>>>>> > >> > > > >>>>>>>> > >> > > > >>>>>>>> > >> > > On Saturday, August 10, 2019, Jacques Nadeau < > jacq...@apache.org> > >>>>>>>> > >> wrote: > >>>>>>>> > >> > > > >>>>>>>> > >> > >> This is a pretty massive change to the apis. I wonder > how nasty it > >>>>>>>> > >> would > >>>>>>>> > >> > >> be to just support both paths. Have you evaluated how > complex that > >>>>>>>> > >> > would be? > >>>>>>>> > >> > >> > >>>>>>>> > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield < > >>>>>>>> > >> emkornfi...@gmail.com> > >>>>>>>> > >> > >> wrote: > >>>>>>>> > >> > >> > >>>>>>>> > >> > >>> After more investigation, it looks like > Float8Benchmarks at least > >>>>>>>> > >> on my > >>>>>>>> > >> > >>> machine are within the range of noise. > >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> For BitVectorHelper I pushed a new commit [1], seems > to bring the > >>>>>>>> > >> > >>> BitVectorHelper benchmarks back inline (and even with > some > >>>>>>>> > >> improvement > >>>>>>>> > >> > >>> for > >>>>>>>> > >> > >>> getNullCountBenchmark). > >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> Benchmark Mode > Cnt Score > >>>>>>>> > >> > >>> Error > >>>>>>>> > >> > >>> Units > >>>>>>>> > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark avgt > 5 3.821 ± > >>>>>>>> > >> > >>> 0.031 > >>>>>>>> > >> > >>> ns/op > >>>>>>>> > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark avgt > 5 14.884 ± > >>>>>>>> > >> > >>> 0.141 > >>>>>>>> > >> > >>> ns/op > >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> I applied the same pattern to other loops that I could > find, and for > >>>>>>>> > >> > any > >>>>>>>> > >> > >>> "for (long" loop on the critical path, I broke it up > into two loops. > >>>>>>>> > >> > the > >>>>>>>> > >> > >>> first loop does iteration by integer, the second > finishes off for > >>>>>>>> > >> any > >>>>>>>> > >> > >>> long > >>>>>>>> > >> > >>> values. As a side note it seems like optimization for > loops using > >>>>>>>> > >> long > >>>>>>>> > >> > >>> counters at least have a semi-recent open bug for the > JVM [2] > >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> Thanks, > >>>>>>>> > >> > >>> Micah > >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> [1] > >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > >>>>>>>> > >> > > >>>>>>>> > >> > https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797 > >>>>>>>> > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051 > >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield < > >>>>>>>> > >> emkornfi...@gmail.com> > >>>>>>>> > >> > >>> wrote: > >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > Indeed, the BoundChecking and CheckNullForGet > variables can make a > >>>>>>>> > >> > big > >>>>>>>> > >> > >>> > difference. I didn't initially run the benchmarks > with these > >>>>>>>> > >> turned > >>>>>>>> > >> > on > >>>>>>>> > >> > >>> > (you can see the result from above with > Float8Benchmarks). Here > >>>>>>>> > >> are > >>>>>>>> > >> > >>> new > >>>>>>>> > >> > >>> > numbers including with the flags enabled. It looks > like using > >>>>>>>> > >> longs > >>>>>>>> > >> > >>> might > >>>>>>>> > >> > >>> > be a little bit slower, I'll see what I can do to > mitigate this. > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > Ravindra also volunteered to try to benchmark the > changes with > >>>>>>>> > >> > Dremio's > >>>>>>>> > >> > >>> > code on today's sync call. > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > New > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > Benchmark > Mode Cnt Score > >>>>>>>> > >> > >>> Error > >>>>>>>> > >> > >>> > Units > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark > avgt 5 > >>>>>>>> > >> 4.176 ± > >>>>>>>> > >> > >>> 1.292 > >>>>>>>> > >> > >>> > ns/op > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark > avgt 5 > >>>>>>>> > >> 26.102 ± > >>>>>>>> > >> > >>> 0.700 > >>>>>>>> > >> > >>> > ns/op > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark avgt 5 > 7.398 ± 0.084 > >>>>>>>> > >> us/op > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark avgt 5 > 2.711 ± 0.057 > >>>>>>>> > >> us/op > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > old > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark > avgt 5 > >>>>>>>> > >> 3.828 ± > >>>>>>>> > >> > >>> 0.030 > >>>>>>>> > >> > >>> > ns/op > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark > avgt 5 > >>>>>>>> > >> 20.611 ± > >>>>>>>> > >> > >>> 0.188 > >>>>>>>> > >> > >>> > ns/op > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark avgt 5 > 6.597 ± 0.462 > >>>>>>>> > >> us/op > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark avgt 5 > 2.615 ± 0.027 > >>>>>>>> > >> us/op > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya < > liya.fa...@gmail.com> > >>>>>>>> > >> > wrote: > >>>>>>>> > >> > >>> > > >>>>>>>> > >> > >>> >> Hi Gonzalo, > >>>>>>>> > >> > >>> >> > >>>>>>>> > >> > >>> >> Thanks for sharing the performance results. > >>>>>>>> > >> > >>> >> I am wondering if you have turned off the flag > >>>>>>>> > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED. > >>>>>>>> > >> > >>> >> If not, the lower throughput should be expected. > >>>>>>>> > >> > >>> >> > >>>>>>>> > >> > >>> >> Best, > >>>>>>>> > >> > >>> >> Liya Fan > >>>>>>>> > >> > >>> >> > >>>>>>>> > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield < > >>>>>>>> > >> > >>> emkornfi...@gmail.com> > >>>>>>>> > >> > >>> >> wrote: > >>>>>>>> > >> > >>> >> > >>>>>>>> > >> > >>> >>> Hi Gonzalo, > >>>>>>>> > >> > >>> >>> Thank you for the feedback. I wasn't aware of the > JIT > >>>>>>>> > >> > >>> implications. At > >>>>>>>> > >> > >>> >>> least on the benchmark run they don't seem to have > an impact. > >>>>>>>> > >> > >>> >>> > >>>>>>>> > >> > >>> >>> If there are other benchmarks that people have > that can > >>>>>>>> > >> validate if > >>>>>>>> > >> > >>> this > >>>>>>>> > >> > >>> >>> change will be problematic I would appreciate > trying to run them > >>>>>>>> > >> > >>> with the > >>>>>>>> > >> > >>> >>> PR. I will try to run the ones for zeroing/popcnt > tonight to > >>>>>>>> > >> see > >>>>>>>> > >> > if > >>>>>>>> > >> > >>> >>> there > >>>>>>>> > >> > >>> >>> is a change in those. > >>>>>>>> > >> > >>> >>> > >>>>>>>> > >> > >>> >>> -Micah > >>>>>>>> > >> > >>> >>> > >>>>>>>> > >> > >>> >>> > >>>>>>>> > >> > >>> >>> > >>>>>>>> > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz > Jaureguizar < > >>>>>>>> > >> > >>> >>> golthir...@gmail.com> wrote: > >>>>>>>> > >> > >>> >>> > >>>>>>>> > >> > >>> >>> > I would recommend to take care with this kind of > changes. > >>>>>>>> > >> > >>> >>> > > >>>>>>>> > >> > >>> >>> > I didn't try Arrow in more than one year, but by > then the > >>>>>>>> > >> > >>> performance > >>>>>>>> > >> > >>> >>> was > >>>>>>>> > >> > >>> >>> > quite bad in comparison with plain byte buffer > access > >>>>>>>> > >> > >>> >>> > (see > http://git.net/apache-arrow-development/msg02353.html *) > >>>>>>>> > >> > and > >>>>>>>> > >> > >>> >>> > there are several optimizations that the JVM > (specifically, > >>>>>>>> > >> C2) > >>>>>>>> > >> > >>> does > >>>>>>>> > >> > >>> >>> not > >>>>>>>> > >> > >>> >>> > apply when dealing with int instead of longs. > One of the > >>>>>>>> > >> > >>> >>> > most commons is the loop unrolling and > vectorization. > >>>>>>>> > >> > >>> >>> > > >>>>>>>> > >> > >>> >>> > * It doesn't seem the best way to reference an > old email on > >>>>>>>> > >> the > >>>>>>>> > >> > >>> list, > >>>>>>>> > >> > >>> >>> but > >>>>>>>> > >> > >>> >>> > it is the only result shown by Google > >>>>>>>> > >> > >>> >>> > > >>>>>>>> > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (< > >>>>>>>> > >> > liya.fa...@gmail.com > >>>>>>>> > >> > >>> >) > >>>>>>>> > >> > >>> >>> > escribió: > >>>>>>>> > >> > >>> >>> > > >>>>>>>> > >> > >>> >>> >> Hi Micah, > >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> Thanks for your effort. The performance result > looks good. > >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> As you indicated, ArrowBuf will take additional > 12 bytes (4 > >>>>>>>> > >> > bytes > >>>>>>>> > >> > >>> for > >>>>>>>> > >> > >>> >>> each > >>>>>>>> > >> > >>> >>> >> of length, write index, and read index). > >>>>>>>> > >> > >>> >>> >> Similar overheads also exist for vectors like > >>>>>>>> > >> > >>> BaseFixedWidthVector, > >>>>>>>> > >> > >>> >>> >> BaseVariableWidthVector, etc. > >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> IMO, such overheads are small enough to justify > the change. > >>>>>>>> > >> > >>> >>> >> Let's check if there are other overheads. > >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> Best, > >>>>>>>> > >> > >>> >>> >> Liya Fan > >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield < > >>>>>>>> > >> > >>> emkornfi...@gmail.com > >>>>>>>> > >> > >>> >>> > > >>>>>>>> > >> > >>> >>> >> wrote: > >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> > Hi Liya Fan, > >>>>>>>> > >> > >>> >>> >> > Based on the Float8Benchmark there does not > seem to be any > >>>>>>>> > >> > >>> >>> meaningful > >>>>>>>> > >> > >>> >>> >> > performance difference on my machine. At > least for me, the > >>>>>>>> > >> > >>> >>> benchmarks > >>>>>>>> > >> > >>> >>> >> are > >>>>>>>> > >> > >>> >>> >> > not stable enough to say one is faster than > the other (I've > >>>>>>>> > >> > >>> pasted > >>>>>>>> > >> > >>> >>> >> results > >>>>>>>> > >> > >>> >>> >> > below). That being said my machine isn't > necessarily the > >>>>>>>> > >> most > >>>>>>>> > >> > >>> >>> reliable > >>>>>>>> > >> > >>> >>> >> for > >>>>>>>> > >> > >>> >>> >> > benchmarking. > >>>>>>>> > >> > >>> >>> >> > > >>>>>>>> > >> > >>> >>> >> > On an intuitive level, this makes sense to > me, for the > >>>>>>>> > >> most > >>>>>>>> > >> > >>> part it > >>>>>>>> > >> > >>> >>> >> seems > >>>>>>>> > >> > >>> >>> >> > like the change just moves casting from "int" > to "long" > >>>>>>>> > >> > further > >>>>>>>> > >> > >>> up > >>>>>>>> > >> > >>> >>> the > >>>>>>>> > >> > >>> >>> >> > stack for "PlatformDepdendent" operations. > If there are > >>>>>>>> > >> > other > >>>>>>>> > >> > >>> >>> >> benchmarks > >>>>>>>> > >> > >>> >>> >> > that you think are worth running let me know. > >>>>>>>> > >> > >>> >>> >> > > >>>>>>>> > >> > >>> >>> >> > One downside performance wise I think for his > change is it > >>>>>>>> > >> > >>> >>> increases the > >>>>>>>> > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose > could influence > >>>>>>>> > >> > cache > >>>>>>>> > >> > >>> >>> misses > >>>>>>>> > >> > >>> >>> >> at > >>>>>>>> > >> > >>> >>> >> > some level or increase the size of > call-stacks, but this > >>>>>>>> > >> > doesn't > >>>>>>>> > >> > >>> >>> seem to > >>>>>>>> > >> > >>> >>> >> > show up in the benchmark.. > >>>>>>>> > >> > >>> >>> >> > > >>>>>>>> > >> > >>> >>> >> > Thanks, > >>>>>>>> > >> > >>> >>> >> > Micah > >>>>>>>> > >> > >>> >>> >> > > >>>>>>>> > >> > >>> >>> >> > Sample benchmark numbers: > >>>>>>>> > >> > >>> >>> >> > > >>>>>>>> > >> > >>> >>> >> > [New Code] > >>>>>>>> > >> > >>> >>> >> > Benchmark Mode > Cnt Score > >>>>>>>> > >> Error > >>>>>>>> > >> > >>> >>> Units > >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark avgt > 5 15.441 ± > >>>>>>>> > >> 0.469 > >>>>>>>> > >> > >>> >>> us/op > >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark avgt > 5 14.057 ± > >>>>>>>> > >> 0.115 > >>>>>>>> > >> > >>> >>> us/op > >>>>>>>> > >> > >>> >>> >> > > >>>>>>>> > >> > >>> >>> >> > > >>>>>>>> > >> > >>> >>> >> > [Old code] > >>>>>>>> > >> > >>> >>> >> > Benchmark Mode > Cnt Score > >>>>>>>> > >> Error > >>>>>>>> > >> > >>> >>> Units > >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark avgt > 5 16.248 ± > >>>>>>>> > >> 1.409 > >>>>>>>> > >> > >>> >>> us/op > >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark avgt > 5 14.150 ± > >>>>>>>> > >> 0.084 > >>>>>>>> > >> > >>> >>> us/op > >>>>>>>> > >> > >>> >>> >> > > >>>>>>>> > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya < > >>>>>>>> > >> liya.fa...@gmail.com > >>>>>>>> > >> > > > >>>>>>>> > >> > >>> >>> wrote: > >>>>>>>> > >> > >>> >>> >> > > >>>>>>>> > >> > >>> >>> >> >> Hi Micah, > >>>>>>>> > >> > >>> >>> >> >> > >>>>>>>> > >> > >>> >>> >> >> Thanks a lot for doing this. > >>>>>>>> > >> > >>> >>> >> >> > >>>>>>>> > >> > >>> >>> >> >> I am a little concerned about if there is > any negative > >>>>>>>> > >> > >>> performance > >>>>>>>> > >> > >>> >>> >> impact > >>>>>>>> > >> > >>> >>> >> >> on the current 32-bit-length based > applications. > >>>>>>>> > >> > >>> >>> >> >> Can we do some performance comparison on our > existing > >>>>>>>> > >> > >>> benchmarks? > >>>>>>>> > >> > >>> >>> >> >> > >>>>>>>> > >> > >>> >>> >> >> Best, > >>>>>>>> > >> > >>> >>> >> >> Liya Fan > >>>>>>>> > >> > >>> >>> >> >> > >>>>>>>> > >> > >>> >>> >> >> > >>>>>>>> > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah > Kornfield < > >>>>>>>> > >> > >>> >>> emkornfi...@gmail.com> > >>>>>>>> > >> > >>> >>> >> >> wrote: > >>>>>>>> > >> > >>> >>> >> >> > >>>>>>>> > >> > >>> >>> >> >>> There have been some previous discussions > on the mailing > >>>>>>>> > >> > about > >>>>>>>> > >> > >>> >>> >> supporting > >>>>>>>> > >> > >>> >>> >> >>> 64-bit lengths for Java ValueVectors (this > is what the > >>>>>>>> > >> IPC > >>>>>>>> > >> > >>> >>> >> specification > >>>>>>>> > >> > >>> >>> >> >>> and C++ support). I created a PR [1] that > changes all > >>>>>>>> > >> APIs > >>>>>>>> > >> > >>> that I > >>>>>>>> > >> > >>> >>> >> could > >>>>>>>> > >> > >>> >>> >> >>> find that take an index to take an "long" > instead of an > >>>>>>>> > >> > "int" > >>>>>>>> > >> > >>> (and > >>>>>>>> > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs). > >>>>>>>> > >> > >>> >>> >> >>> > >>>>>>>> > >> > >>> >>> >> >>> It is a big change, so I think it is worth > discussing if > >>>>>>>> > >> it > >>>>>>>> > >> > is > >>>>>>>> > >> > >>> >>> >> something > >>>>>>>> > >> > >>> >>> >> >>> we > >>>>>>>> > >> > >>> >>> >> >>> still want to move forward with. It would > be nice to > >>>>>>>> > >> come > >>>>>>>> > >> > to > >>>>>>>> > >> > >>> a > >>>>>>>> > >> > >>> >>> >> >>> conclusion > >>>>>>>> > >> > >>> >>> >> >>> quickly, ideally in the next few days, to > avoid a lot of > >>>>>>>> > >> > merge > >>>>>>>> > >> > >>> >>> >> conflicts. > >>>>>>>> > >> > >>> >>> >> >>> > >>>>>>>> > >> > >>> >>> >> >>> The reason I did this work now is the C++ > implementation > >>>>>>>> > >> has > >>>>>>>> > >> > >>> added > >>>>>>>> > >> > >>> >>> >> >>> support > >>>>>>>> > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString > arrays and > >>>>>>>> > >> based > >>>>>>>> > >> > on > >>>>>>>> > >> > >>> >>> prior > >>>>>>>> > >> > >>> >>> >> >>> discussions we need to have similar support > in Java > >>>>>>>> > >> before > >>>>>>>> > >> > our > >>>>>>>> > >> > >>> >>> next > >>>>>>>> > >> > >>> >>> >> >>> release. Support 64-bit indexes means we > can have full > >>>>>>>> > >> > >>> >>> compatibility > >>>>>>>> > >> > >>> >>> >> and > >>>>>>>> > >> > >>> >>> >> >>> make the most use of the types in Java. > >>>>>>>> > >> > >>> >>> >> >>> > >>>>>>>> > >> > >>> >>> >> >>> Look forward to hearing feedback. > >>>>>>>> > >> > >>> >>> >> >>> > >>>>>>>> > >> > >>> >>> >> >>> Thanks, > >>>>>>>> > >> > >>> >>> >> >>> Micah > >>>>>>>> > >> > >>> >>> >> >>> > >>>>>>>> > >> > >>> >>> >> >>> [1] > https://github.com/apache/arrow/pull/5020 > >>>>>>>> > >> > >>> >>> >> >>> > >>>>>>>> > >> > >>> >>> >> >> > >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> > > >>>>>>>> > >> > >>> >>> > >>>>>>>> > >> > >>> >> > >>>>>>>> > >> > >>> > >>>>>>>> > >> > >> > >>>>>>>> > >> > > >>>>>>>> > >> > >>>>>>>> > > >