Apologies for the long delay, I chose to do the minimal work of limiting this change [1] to allowing ArrowBuf to 64-bit lengths. This would unblock work on LargeString and LargeBinary. If this change looks OK, I think there is some follow-up work to add more thorough unit/integration tests.
As an aside, it does seem like the 2GB limit is affecting some users in Spark [2][3], so hopefully LargeString would help with this. Allowing more than MAX_INT elements is Vectors/array still a blocker for making LargeList useful. Thanks, Micah [1] https://github.com/apache/arrow/pull/5020 [2] https://stackoverflow.com/questions/58739888/spark-is-it-possible-to-increase-pyarrow-buffer#comment103812119_58739888 [3] https://issues.apache.org/jira/browse/ARROW-4890 On Fri, Aug 23, 2019 at 8:33 AM Jacques Nadeau <jacq...@apache.org> wrote: > > > On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> The vector indexes being limited to 32 bits doesn't limit the addressing >>> to 32 bit chunks of memory. For example, you're prime example before was >>> image data. Having 2 billion images of 1mb images would still be supported >>> without changing the index addressing. >> >> This might be pre-coffee math, but I think we are limited to >> approximately 2000 images because an ArrowBuf only holds up-to 2 billion >> bytes [1]. While we have plenty of room for the offsets, we quickly run >> out of contiguous data storage space. For LargeString and LargeBinary this >> could be fixed by changing ArrowBuf. >> >> LargeArray faces the same problem only it applies to its child vectors. >> Supporting LargeArray properly is really what drove the large-scale >> interface change. >> > > My expressed concern about these changes was specifically about the use of > long for get/set in the vector interfaces. I'm not saying that we constrain > memory/ArrowBufs to 32bits. > > >> Rebase would help if possible. >> >> I'll try to get to this in the next few days. >> >> [1] >> https://github.com/apache/arrow/blob/95175fe7cb8439eebe6d2f6e0495f551d6864380/java/memory/src/main/java/io/netty/buffer/ArrowBuf.java#L164 >> >> On Fri, Aug 23, 2019 at 4:55 AM Jacques Nadeau <jacq...@apache.org> >> wrote: >> >>> >>> >>> On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield <emkornfi...@gmail.com> >>> wrote: >>> >>>> I don't think we should couple this discussion with the implementation >>>>> of large list, etc since I think those two concepts are independent. >>>> >>>> I'm still trying to balance in my mind which is a worse experience for >>>> consumers of the libraries for these types. Claiming that Java supports >>>> these types but throwing an exception when the Vectors exceed 32-bits or >>>> just say they aren't supported until we have 64-bit support in Java. >>>> >>> >>> The vector indexes being limited to 32 bits doesn't limit the addressing >>> to 32 bit chunks of memory. For example, you're prime example before was >>> image data. Having 2 billion images of 1mb images would still be supported >>> without changing the index addressing. >>> >>> >>> >>>> >>>>> I've asked some others on my team their opinions on the risk here. I >>>>> think we should probably review some our more complex vector interactions >>>>> and see how the jvm's assembly changes with this kind of change. Using >>>>> microbenchmarking is good but I think we also need to see whether we're >>>>> constantly inserting additional instructions or if in most cases, this >>>>> actually doesn't impact instruction count. >>>> >>>> >>>> Is this something that your team will take on? >>>> >>> >>> >>> Yeah, we need to look at this I think. >>> >>> Do you need a rebased version of the PR or is the existing one >>>> sufficient? >>>> >>> >>> Rebase would help if possible. >>> >>> >>> >>>> Thanks, >>>> Micah >>>> >>>> On Thu, Aug 22, 2019 at 8:56 PM Jacques Nadeau <jacq...@apache.org> >>>> wrote: >>>> >>>>> I don't think we should couple this discussion with the implementation >>>>> of large list, etc since I think those two concepts are independent. >>>>> >>>>> I've asked some others on my team their opinions on the risk here. I >>>>> think we should probably review some our more complex vector interactions >>>>> and see how the jvm's assembly changes with this kind of change. Using >>>>> microbenchmarking is good but I think we also need to see whether we're >>>>> constantly inserting additional instructions or if in most cases, this >>>>> actually doesn't impact instruction count. >>>>> >>>>> >>>>> >>>>> On Wed, Aug 21, 2019 at 12:18 PM Micah Kornfield < >>>>> emkornfi...@gmail.com> wrote: >>>>> >>>>>> >>>>>>> With regards to the reference implementation point. It is a good >>>>>>> point. I'm on vacation this week. Unless you're pushing hard on this, >>>>>>> can >>>>>>> we pick this up and discuss more next week? >>>>>> >>>>>> >>>>>> Hi Jacques, I hope you had a good rest. Any more thoughts on the >>>>>> reference implementation aspect of this? >>>>>> >>>>>> >>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it >>>>>>> would be best to decouple this discussion from the release timeline >>>>>>> given how many people we have relying on regular releases coming out. >>>>>>> We can keep continue making major 0.x releases until we're ready to >>>>>>> release 1.0.0. >>>>>> >>>>>> >>>>>> I'm OK with it as long as other stakeholders are. Timed releases are >>>>>> the way to go. As stated on the release thread [1] we need a better >>>>>> mechanism to avoid this type of issue arising again. The release thread >>>>>> also had some more discussion on compatibility. >>>>>> >>>>>> Thanks, >>>>>> Micah >>>>>> >>>>>> [1] >>>>>> https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E >>>>>> >>>>>> >>>>>> On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <wesmck...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield < >>>>>>> emkornfi...@gmail.com> wrote: >>>>>>> > >>>>>>> > Hi Wes and Jacques, >>>>>>> > See responses below. >>>>>>> > >>>>>>> > With regards to the reference implementation point. It is a good >>>>>>> point. I'm >>>>>>> > > on vacation this week. Unless you're pushing hard on this, can >>>>>>> we pick this >>>>>>> > > up and discuss more next week? >>>>>>> > >>>>>>> > >>>>>>> > Sure thing, enjoy your vacation. I think the only practical >>>>>>> implications >>>>>>> > are it delays choices around implementing LargeList, LargeBinary, >>>>>>> > LargeString in Java, which in turn might push out the 0.15.0 >>>>>>> release. >>>>>>> > >>>>>>> >>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it >>>>>>> would be best to decouple this discussion from the release timeline >>>>>>> given how many people we have relying on regular releases coming out. >>>>>>> We can keep continue making major 0.x releases until we're ready to >>>>>>> release 1.0.0. >>>>>>> >>>>>>> > My stance on this is that I don't know how important it is for >>>>>>> Java to >>>>>>> > > support vectors over INT32_MAX elements. The use cases enabled by >>>>>>> > > having very large arrays seem to be concentrated in the native >>>>>>> code >>>>>>> > > world (e.g. C/C++/Rust) -- that could just be >>>>>>> implementation-centrism >>>>>>> > > on my part, though. >>>>>>> > >>>>>>> > >>>>>>> > A data point against this view is Spark has done work to eliminate >>>>>>> 2GB >>>>>>> > memory limits on its block sizes [1]. I don't claim to understand >>>>>>> the >>>>>>> > implications of this. Bryan might you have any thoughts here? I'm >>>>>>> OK with >>>>>>> > INT32_MAX, as well, I think we should think about what this means >>>>>>> for >>>>>>> > adding Large types to Java and implications for reference >>>>>>> implementations. >>>>>>> > >>>>>>> > Thanks, >>>>>>> > Micah >>>>>>> > >>>>>>> > [1] https://issues.apache.org/jira/browse/SPARK-6235 >>>>>>> > >>>>>>> > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau <jacq...@apache.org> >>>>>>> wrote: >>>>>>> > >>>>>>> > > Hey Micah, >>>>>>> > > >>>>>>> > > Appreciate the offer on the compiling. The reality is I'm more >>>>>>> concerned >>>>>>> > > about the unknowns than the compiling issue itself. Any time >>>>>>> you've been >>>>>>> > > tuning for a while, changing something like this could be >>>>>>> totally fine or >>>>>>> > > cause a couple of major issues. For example, we've done a very >>>>>>> large amount >>>>>>> > > of work reducing heap memory footprint of the vectors. Are >>>>>>> target is to >>>>>>> > > actually get it down to 24 bytes per ArrowBuf and 24 bytes heap >>>>>>> per vector >>>>>>> > > (not including arrow bufs). >>>>>>> > > >>>>>>> > > With regards to the reference implementation point. It is a good >>>>>>> point. >>>>>>> > > I'm on vacation this week. Unless you're pushing hard on this, >>>>>>> can we pick >>>>>>> > > this up and discuss more next week? >>>>>>> > > >>>>>>> > > thanks, >>>>>>> > > Jacques >>>>>>> > > >>>>>>> > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield < >>>>>>> emkornfi...@gmail.com> >>>>>>> > > wrote: >>>>>>> > > >>>>>>> > >> Hi Jacques, >>>>>>> > >> I definitely understand these concerns and this change is risky >>>>>>> because it >>>>>>> > >> is so large. Perhaps, creating a new hierarchy, might be the >>>>>>> cleanest way >>>>>>> > >> of dealing with this. This could have other benefits like >>>>>>> cleaning up >>>>>>> > >> some >>>>>>> > >> cruft around dictionary encode and "orphaned" method. Per >>>>>>> past e-mail >>>>>>> > >> threads I agree it is beneficial to have 2 separate reference >>>>>>> > >> implementations that can communicate fully, and my intent here >>>>>>> was to >>>>>>> > >> close >>>>>>> > >> that gap. >>>>>>> > >> >>>>>>> > >> Trying to >>>>>>> > >> > determine the ramifications of these changes would be >>>>>>> challenging and >>>>>>> > >> time >>>>>>> > >> > consuming against all the different ways we interact with the >>>>>>> Arrow Java >>>>>>> > >> > library. >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> Understood. I took a quick look at Dremio-OSS it seems like it >>>>>>> has a >>>>>>> > >> simple java build system? If it is helpful, I can try to get a >>>>>>> fork >>>>>>> > >> running that at least compiles against this PR. My plan would >>>>>>> be to cast >>>>>>> > >> any place that was changed to return a long back to an int, so >>>>>>> in essence >>>>>>> > >> the Dremio algorithms would reman 32-bit implementations. >>>>>>> > >> >>>>>>> > >> I don't have the infrastructure to test this change properly >>>>>>> from a >>>>>>> > >> distributed systems perspective, so it would still take some >>>>>>> time from >>>>>>> > >> Dremio to validate for regressions. >>>>>>> > >> >>>>>>> > >> I'm not saying I'm against this but want to make sure we've >>>>>>> > >> > explored all less disruptive options before considering >>>>>>> changing >>>>>>> > >> something >>>>>>> > >> > this fundamental (especially when I generally hold the view >>>>>>> that large >>>>>>> > >> cell >>>>>>> > >> > counts against massive contiguous memory is an anti pattern >>>>>>> to scalable >>>>>>> > >> > analytical processing--purely subjective of course). >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> I'm open to other ideas here, as well. I don't think it is out >>>>>>> of the >>>>>>> > >> question to leave the Java implementation as 32-bit, but if we >>>>>>> do, then I >>>>>>> > >> think we should consider a different strategy for reference >>>>>>> > >> implementations. >>>>>>> > >> >>>>>>> > >> Thanks, >>>>>>> > >> Micah >>>>>>> > >> >>>>>>> > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau < >>>>>>> jacq...@apache.org> >>>>>>> > >> wrote: >>>>>>> > >> >>>>>>> > >> > Hey Micah, I didn't have a particular path in mind. Was >>>>>>> thinking more >>>>>>> > >> along >>>>>>> > >> > the lines of extra methods as opposed to separate classes. >>>>>>> > >> > >>>>>>> > >> > Arrow hasn't historically been a place where we're writing >>>>>>> algorithms in >>>>>>> > >> > Java so the fact that they aren't there doesn't mean they >>>>>>> don't exist. >>>>>>> > >> We >>>>>>> > >> > have a large amount of code that depends on the current >>>>>>> behavior that is >>>>>>> > >> > deployed in hundreds of customer clusters (you can peruse our >>>>>>> dremio >>>>>>> > >> repo >>>>>>> > >> > to see how extensively we leverage Arrow if interested). >>>>>>> Trying to >>>>>>> > >> > determine the ramifications of these changes would be >>>>>>> challenging and >>>>>>> > >> time >>>>>>> > >> > consuming against all the different ways we interact with the >>>>>>> Arrow Java >>>>>>> > >> > library. I'm not saying I'm against this but want to make >>>>>>> sure we've >>>>>>> > >> > explored all less disruptive options before considering >>>>>>> changing >>>>>>> > >> something >>>>>>> > >> > this fundamental (especially when I generally hold the view >>>>>>> that large >>>>>>> > >> cell >>>>>>> > >> > counts against massive contiguous memory is an anti pattern >>>>>>> to scalable >>>>>>> > >> > analytical processing--purely subjective of course). >>>>>>> > >> > >>>>>>> > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield < >>>>>>> emkornfi...@gmail.com> >>>>>>> > >> > wrote: >>>>>>> > >> > >>>>>>> > >> > > Hi Jacques, >>>>>>> > >> > > What avenue were you thinking for supporting both paths? >>>>>>> I didn't >>>>>>> > >> want >>>>>>> > >> > > to pursue a different class hierarchy, because I felt like >>>>>>> that would >>>>>>> > >> > > effectively fork the code base, but that is potentially an >>>>>>> option that >>>>>>> > >> > > would allow us to have a complete reference implementation >>>>>>> in Java >>>>>>> > >> that >>>>>>> > >> > can >>>>>>> > >> > > fully interact with C++, without major changes to this code. >>>>>>> > >> > > >>>>>>> > >> > > For supporting both APIs on the same classes/interfaces, I >>>>>>> think they >>>>>>> > >> > > roughly fall into three categories, changes to input >>>>>>> parameters, >>>>>>> > >> changes >>>>>>> > >> > to >>>>>>> > >> > > output parameters and algorithm changes. >>>>>>> > >> > > >>>>>>> > >> > > For inputs, changing from int to long is essentially a >>>>>>> no-op from the >>>>>>> > >> > > compiler perspective. From the limited micro-benchmarking >>>>>>> this also >>>>>>> > >> > > doesn't seem to have a performance impact. So we could >>>>>>> keep two >>>>>>> > >> versions >>>>>>> > >> > > of the methods that only differ on inputs, but it is not >>>>>>> clear what >>>>>>> > >> the >>>>>>> > >> > > value of that would be. >>>>>>> > >> > > >>>>>>> > >> > > For outputs, we can't support methods "long getLength()" >>>>>>> and "int >>>>>>> > >> > > getLength()" in the same class, so we would be forced into >>>>>>> something >>>>>>> > >> like >>>>>>> > >> > > "long getLength(boolean dummy)" which I think is a less >>>>>>> desirable. >>>>>>> > >> > > >>>>>>> > >> > > For algorithm changes, there did not appear to be too many >>>>>>> places >>>>>>> > >> where >>>>>>> > >> > we >>>>>>> > >> > > actually loop over all elements (it is quite possible I >>>>>>> missed >>>>>>> > >> something >>>>>>> > >> > > here), the ones that I did find I was able to mitigate >>>>>>> performance >>>>>>> > >> > > penalties as noted above. Some of the current >>>>>>> implementation will >>>>>>> > >> get a >>>>>>> > >> > > lot slower for "large arrays", but we can likely fix those >>>>>>> later or in >>>>>>> > >> > this >>>>>>> > >> > > PR with a nested while loop instead of 2 for loops. >>>>>>> > >> > > >>>>>>> > >> > > Thanks, >>>>>>> > >> > > Micah >>>>>>> > >> > > >>>>>>> > >> > > >>>>>>> > >> > > On Saturday, August 10, 2019, Jacques Nadeau < >>>>>>> jacq...@apache.org> >>>>>>> > >> wrote: >>>>>>> > >> > > >>>>>>> > >> > >> This is a pretty massive change to the apis. I wonder how >>>>>>> nasty it >>>>>>> > >> would >>>>>>> > >> > >> be to just support both paths. Have you evaluated how >>>>>>> complex that >>>>>>> > >> > would be? >>>>>>> > >> > >> >>>>>>> > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield < >>>>>>> > >> emkornfi...@gmail.com> >>>>>>> > >> > >> wrote: >>>>>>> > >> > >> >>>>>>> > >> > >>> After more investigation, it looks like Float8Benchmarks >>>>>>> at least >>>>>>> > >> on my >>>>>>> > >> > >>> machine are within the range of noise. >>>>>>> > >> > >>> >>>>>>> > >> > >>> For BitVectorHelper I pushed a new commit [1], seems to >>>>>>> bring the >>>>>>> > >> > >>> BitVectorHelper benchmarks back inline (and even with some >>>>>>> > >> improvement >>>>>>> > >> > >>> for >>>>>>> > >> > >>> getNullCountBenchmark). >>>>>>> > >> > >>> >>>>>>> > >> > >>> Benchmark Mode >>>>>>> Cnt Score >>>>>>> > >> > >>> Error >>>>>>> > >> > >>> Units >>>>>>> > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark avgt >>>>>>> 5 3.821 ± >>>>>>> > >> > >>> 0.031 >>>>>>> > >> > >>> ns/op >>>>>>> > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark avgt >>>>>>> 5 14.884 ± >>>>>>> > >> > >>> 0.141 >>>>>>> > >> > >>> ns/op >>>>>>> > >> > >>> >>>>>>> > >> > >>> I applied the same pattern to other loops that I could >>>>>>> find, and for >>>>>>> > >> > any >>>>>>> > >> > >>> "for (long" loop on the critical path, I broke it up into >>>>>>> two loops. >>>>>>> > >> > the >>>>>>> > >> > >>> first loop does iteration by integer, the second finishes >>>>>>> off for >>>>>>> > >> any >>>>>>> > >> > >>> long >>>>>>> > >> > >>> values. As a side note it seems like optimization for >>>>>>> loops using >>>>>>> > >> long >>>>>>> > >> > >>> counters at least have a semi-recent open bug for the JVM >>>>>>> [2] >>>>>>> > >> > >>> >>>>>>> > >> > >>> Thanks, >>>>>>> > >> > >>> Micah >>>>>>> > >> > >>> >>>>>>> > >> > >>> [1] >>>>>>> > >> > >>> >>>>>>> > >> > >>> >>>>>>> > >> > >>>>>>> > >> >>>>>>> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797 >>>>>>> > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051 >>>>>>> > >> > >>> >>>>>>> > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield < >>>>>>> > >> emkornfi...@gmail.com> >>>>>>> > >> > >>> wrote: >>>>>>> > >> > >>> >>>>>>> > >> > >>> > Indeed, the BoundChecking and CheckNullForGet variables >>>>>>> can make a >>>>>>> > >> > big >>>>>>> > >> > >>> > difference. I didn't initially run the benchmarks with >>>>>>> these >>>>>>> > >> turned >>>>>>> > >> > on >>>>>>> > >> > >>> > (you can see the result from above with >>>>>>> Float8Benchmarks). Here >>>>>>> > >> are >>>>>>> > >> > >>> new >>>>>>> > >> > >>> > numbers including with the flags enabled. It looks >>>>>>> like using >>>>>>> > >> longs >>>>>>> > >> > >>> might >>>>>>> > >> > >>> > be a little bit slower, I'll see what I can do to >>>>>>> mitigate this. >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > Ravindra also volunteered to try to benchmark the >>>>>>> changes with >>>>>>> > >> > Dremio's >>>>>>> > >> > >>> > code on today's sync call. >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > New >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > Benchmark Mode >>>>>>> Cnt Score >>>>>>> > >> > >>> Error >>>>>>> > >> > >>> > Units >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark avgt >>>>>>> 5 >>>>>>> > >> 4.176 ± >>>>>>> > >> > >>> 1.292 >>>>>>> > >> > >>> > ns/op >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark avgt >>>>>>> 5 >>>>>>> > >> 26.102 ± >>>>>>> > >> > >>> 0.700 >>>>>>> > >> > >>> > ns/op >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark avgt 5 7.398 ± >>>>>>> 0.084 >>>>>>> > >> us/op >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark avgt 5 2.711 ± >>>>>>> 0.057 >>>>>>> > >> us/op >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > old >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark avgt >>>>>>> 5 >>>>>>> > >> 3.828 ± >>>>>>> > >> > >>> 0.030 >>>>>>> > >> > >>> > ns/op >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark avgt >>>>>>> 5 >>>>>>> > >> 20.611 ± >>>>>>> > >> > >>> 0.188 >>>>>>> > >> > >>> > ns/op >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark avgt 5 6.597 ± >>>>>>> 0.462 >>>>>>> > >> us/op >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark avgt 5 2.615 ± >>>>>>> 0.027 >>>>>>> > >> us/op >>>>>>> > >> > >>> > >>>>>>> > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya < >>>>>>> liya.fa...@gmail.com> >>>>>>> > >> > wrote: >>>>>>> > >> > >>> > >>>>>>> > >> > >>> >> Hi Gonzalo, >>>>>>> > >> > >>> >> >>>>>>> > >> > >>> >> Thanks for sharing the performance results. >>>>>>> > >> > >>> >> I am wondering if you have turned off the flag >>>>>>> > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED. >>>>>>> > >> > >>> >> If not, the lower throughput should be expected. >>>>>>> > >> > >>> >> >>>>>>> > >> > >>> >> Best, >>>>>>> > >> > >>> >> Liya Fan >>>>>>> > >> > >>> >> >>>>>>> > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield < >>>>>>> > >> > >>> emkornfi...@gmail.com> >>>>>>> > >> > >>> >> wrote: >>>>>>> > >> > >>> >> >>>>>>> > >> > >>> >>> Hi Gonzalo, >>>>>>> > >> > >>> >>> Thank you for the feedback. I wasn't aware of the JIT >>>>>>> > >> > >>> implications. At >>>>>>> > >> > >>> >>> least on the benchmark run they don't seem to have an >>>>>>> impact. >>>>>>> > >> > >>> >>> >>>>>>> > >> > >>> >>> If there are other benchmarks that people have that >>>>>>> can >>>>>>> > >> validate if >>>>>>> > >> > >>> this >>>>>>> > >> > >>> >>> change will be problematic I would appreciate trying >>>>>>> to run them >>>>>>> > >> > >>> with the >>>>>>> > >> > >>> >>> PR. I will try to run the ones for zeroing/popcnt >>>>>>> tonight to >>>>>>> > >> see >>>>>>> > >> > if >>>>>>> > >> > >>> >>> there >>>>>>> > >> > >>> >>> is a change in those. >>>>>>> > >> > >>> >>> >>>>>>> > >> > >>> >>> -Micah >>>>>>> > >> > >>> >>> >>>>>>> > >> > >>> >>> >>>>>>> > >> > >>> >>> >>>>>>> > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz >>>>>>> Jaureguizar < >>>>>>> > >> > >>> >>> golthir...@gmail.com> wrote: >>>>>>> > >> > >>> >>> >>>>>>> > >> > >>> >>> > I would recommend to take care with this kind of >>>>>>> changes. >>>>>>> > >> > >>> >>> > >>>>>>> > >> > >>> >>> > I didn't try Arrow in more than one year, but by >>>>>>> then the >>>>>>> > >> > >>> performance >>>>>>> > >> > >>> >>> was >>>>>>> > >> > >>> >>> > quite bad in comparison with plain byte buffer >>>>>>> access >>>>>>> > >> > >>> >>> > (see >>>>>>> http://git.net/apache-arrow-development/msg02353.html *) >>>>>>> > >> > and >>>>>>> > >> > >>> >>> > there are several optimizations that the JVM >>>>>>> (specifically, >>>>>>> > >> C2) >>>>>>> > >> > >>> does >>>>>>> > >> > >>> >>> not >>>>>>> > >> > >>> >>> > apply when dealing with int instead of longs. One >>>>>>> of the >>>>>>> > >> > >>> >>> > most commons is the loop unrolling and >>>>>>> vectorization. >>>>>>> > >> > >>> >>> > >>>>>>> > >> > >>> >>> > * It doesn't seem the best way to reference an old >>>>>>> email on >>>>>>> > >> the >>>>>>> > >> > >>> list, >>>>>>> > >> > >>> >>> but >>>>>>> > >> > >>> >>> > it is the only result shown by Google >>>>>>> > >> > >>> >>> > >>>>>>> > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (< >>>>>>> > >> > liya.fa...@gmail.com >>>>>>> > >> > >>> >) >>>>>>> > >> > >>> >>> > escribió: >>>>>>> > >> > >>> >>> > >>>>>>> > >> > >>> >>> >> Hi Micah, >>>>>>> > >> > >>> >>> >> >>>>>>> > >> > >>> >>> >> Thanks for your effort. The performance result >>>>>>> looks good. >>>>>>> > >> > >>> >>> >> >>>>>>> > >> > >>> >>> >> As you indicated, ArrowBuf will take additional 12 >>>>>>> bytes (4 >>>>>>> > >> > bytes >>>>>>> > >> > >>> for >>>>>>> > >> > >>> >>> each >>>>>>> > >> > >>> >>> >> of length, write index, and read index). >>>>>>> > >> > >>> >>> >> Similar overheads also exist for vectors like >>>>>>> > >> > >>> BaseFixedWidthVector, >>>>>>> > >> > >>> >>> >> BaseVariableWidthVector, etc. >>>>>>> > >> > >>> >>> >> >>>>>>> > >> > >>> >>> >> IMO, such overheads are small enough to justify >>>>>>> the change. >>>>>>> > >> > >>> >>> >> Let's check if there are other overheads. >>>>>>> > >> > >>> >>> >> >>>>>>> > >> > >>> >>> >> Best, >>>>>>> > >> > >>> >>> >> Liya Fan >>>>>>> > >> > >>> >>> >> >>>>>>> > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield < >>>>>>> > >> > >>> emkornfi...@gmail.com >>>>>>> > >> > >>> >>> > >>>>>>> > >> > >>> >>> >> wrote: >>>>>>> > >> > >>> >>> >> >>>>>>> > >> > >>> >>> >> > Hi Liya Fan, >>>>>>> > >> > >>> >>> >> > Based on the Float8Benchmark there does not seem >>>>>>> to be any >>>>>>> > >> > >>> >>> meaningful >>>>>>> > >> > >>> >>> >> > performance difference on my machine. At least >>>>>>> for me, the >>>>>>> > >> > >>> >>> benchmarks >>>>>>> > >> > >>> >>> >> are >>>>>>> > >> > >>> >>> >> > not stable enough to say one is faster than the >>>>>>> other (I've >>>>>>> > >> > >>> pasted >>>>>>> > >> > >>> >>> >> results >>>>>>> > >> > >>> >>> >> > below). That being said my machine isn't >>>>>>> necessarily the >>>>>>> > >> most >>>>>>> > >> > >>> >>> reliable >>>>>>> > >> > >>> >>> >> for >>>>>>> > >> > >>> >>> >> > benchmarking. >>>>>>> > >> > >>> >>> >> > >>>>>>> > >> > >>> >>> >> > On an intuitive level, this makes sense to me, >>>>>>> for the >>>>>>> > >> most >>>>>>> > >> > >>> part it >>>>>>> > >> > >>> >>> >> seems >>>>>>> > >> > >>> >>> >> > like the change just moves casting from "int" to >>>>>>> "long" >>>>>>> > >> > further >>>>>>> > >> > >>> up >>>>>>> > >> > >>> >>> the >>>>>>> > >> > >>> >>> >> > stack for "PlatformDepdendent" operations. If >>>>>>> there are >>>>>>> > >> > other >>>>>>> > >> > >>> >>> >> benchmarks >>>>>>> > >> > >>> >>> >> > that you think are worth running let me know. >>>>>>> > >> > >>> >>> >> > >>>>>>> > >> > >>> >>> >> > One downside performance wise I think for his >>>>>>> change is it >>>>>>> > >> > >>> >>> increases the >>>>>>> > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose could >>>>>>> influence >>>>>>> > >> > cache >>>>>>> > >> > >>> >>> misses >>>>>>> > >> > >>> >>> >> at >>>>>>> > >> > >>> >>> >> > some level or increase the size of call-stacks, >>>>>>> but this >>>>>>> > >> > doesn't >>>>>>> > >> > >>> >>> seem to >>>>>>> > >> > >>> >>> >> > show up in the benchmark.. >>>>>>> > >> > >>> >>> >> > >>>>>>> > >> > >>> >>> >> > Thanks, >>>>>>> > >> > >>> >>> >> > Micah >>>>>>> > >> > >>> >>> >> > >>>>>>> > >> > >>> >>> >> > Sample benchmark numbers: >>>>>>> > >> > >>> >>> >> > >>>>>>> > >> > >>> >>> >> > [New Code] >>>>>>> > >> > >>> >>> >> > Benchmark Mode Cnt >>>>>>> Score >>>>>>> > >> Error >>>>>>> > >> > >>> >>> Units >>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark avgt 5 >>>>>>> 15.441 ± >>>>>>> > >> 0.469 >>>>>>> > >> > >>> >>> us/op >>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark avgt 5 >>>>>>> 14.057 ± >>>>>>> > >> 0.115 >>>>>>> > >> > >>> >>> us/op >>>>>>> > >> > >>> >>> >> > >>>>>>> > >> > >>> >>> >> > >>>>>>> > >> > >>> >>> >> > [Old code] >>>>>>> > >> > >>> >>> >> > Benchmark Mode Cnt >>>>>>> Score >>>>>>> > >> Error >>>>>>> > >> > >>> >>> Units >>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark avgt 5 >>>>>>> 16.248 ± >>>>>>> > >> 1.409 >>>>>>> > >> > >>> >>> us/op >>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark avgt 5 >>>>>>> 14.150 ± >>>>>>> > >> 0.084 >>>>>>> > >> > >>> >>> us/op >>>>>>> > >> > >>> >>> >> > >>>>>>> > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya < >>>>>>> > >> liya.fa...@gmail.com >>>>>>> > >> > > >>>>>>> > >> > >>> >>> wrote: >>>>>>> > >> > >>> >>> >> > >>>>>>> > >> > >>> >>> >> >> Hi Micah, >>>>>>> > >> > >>> >>> >> >> >>>>>>> > >> > >>> >>> >> >> Thanks a lot for doing this. >>>>>>> > >> > >>> >>> >> >> >>>>>>> > >> > >>> >>> >> >> I am a little concerned about if there is any >>>>>>> negative >>>>>>> > >> > >>> performance >>>>>>> > >> > >>> >>> >> impact >>>>>>> > >> > >>> >>> >> >> on the current 32-bit-length based applications. >>>>>>> > >> > >>> >>> >> >> Can we do some performance comparison on our >>>>>>> existing >>>>>>> > >> > >>> benchmarks? >>>>>>> > >> > >>> >>> >> >> >>>>>>> > >> > >>> >>> >> >> Best, >>>>>>> > >> > >>> >>> >> >> Liya Fan >>>>>>> > >> > >>> >>> >> >> >>>>>>> > >> > >>> >>> >> >> >>>>>>> > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield < >>>>>>> > >> > >>> >>> emkornfi...@gmail.com> >>>>>>> > >> > >>> >>> >> >> wrote: >>>>>>> > >> > >>> >>> >> >> >>>>>>> > >> > >>> >>> >> >>> There have been some previous discussions on >>>>>>> the mailing >>>>>>> > >> > about >>>>>>> > >> > >>> >>> >> supporting >>>>>>> > >> > >>> >>> >> >>> 64-bit lengths for Java ValueVectors (this is >>>>>>> what the >>>>>>> > >> IPC >>>>>>> > >> > >>> >>> >> specification >>>>>>> > >> > >>> >>> >> >>> and C++ support). I created a PR [1] that >>>>>>> changes all >>>>>>> > >> APIs >>>>>>> > >> > >>> that I >>>>>>> > >> > >>> >>> >> could >>>>>>> > >> > >>> >>> >> >>> find that take an index to take an "long" >>>>>>> instead of an >>>>>>> > >> > "int" >>>>>>> > >> > >>> (and >>>>>>> > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs). >>>>>>> > >> > >>> >>> >> >>> >>>>>>> > >> > >>> >>> >> >>> It is a big change, so I think it is worth >>>>>>> discussing if >>>>>>> > >> it >>>>>>> > >> > is >>>>>>> > >> > >>> >>> >> something >>>>>>> > >> > >>> >>> >> >>> we >>>>>>> > >> > >>> >>> >> >>> still want to move forward with. It would be >>>>>>> nice to >>>>>>> > >> come >>>>>>> > >> > to >>>>>>> > >> > >>> a >>>>>>> > >> > >>> >>> >> >>> conclusion >>>>>>> > >> > >>> >>> >> >>> quickly, ideally in the next few days, to >>>>>>> avoid a lot of >>>>>>> > >> > merge >>>>>>> > >> > >>> >>> >> conflicts. >>>>>>> > >> > >>> >>> >> >>> >>>>>>> > >> > >>> >>> >> >>> The reason I did this work now is the C++ >>>>>>> implementation >>>>>>> > >> has >>>>>>> > >> > >>> added >>>>>>> > >> > >>> >>> >> >>> support >>>>>>> > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString >>>>>>> arrays and >>>>>>> > >> based >>>>>>> > >> > on >>>>>>> > >> > >>> >>> prior >>>>>>> > >> > >>> >>> >> >>> discussions we need to have similar support in >>>>>>> Java >>>>>>> > >> before >>>>>>> > >> > our >>>>>>> > >> > >>> >>> next >>>>>>> > >> > >>> >>> >> >>> release. Support 64-bit indexes means we can >>>>>>> have full >>>>>>> > >> > >>> >>> compatibility >>>>>>> > >> > >>> >>> >> and >>>>>>> > >> > >>> >>> >> >>> make the most use of the types in Java. >>>>>>> > >> > >>> >>> >> >>> >>>>>>> > >> > >>> >>> >> >>> Look forward to hearing feedback. >>>>>>> > >> > >>> >>> >> >>> >>>>>>> > >> > >>> >>> >> >>> Thanks, >>>>>>> > >> > >>> >>> >> >>> Micah >>>>>>> > >> > >>> >>> >> >>> >>>>>>> > >> > >>> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020 >>>>>>> > >> > >>> >>> >> >>> >>>>>>> > >> > >>> >>> >> >> >>>>>>> > >> > >>> >>> >> >>>>>>> > >> > >>> >>> > >>>>>>> > >> > >>> >>> >>>>>>> > >> > >>> >> >>>>>>> > >> > >>> >>>>>>> > >> > >> >>>>>>> > >> > >>>>>>> > >> >>>>>>> > > >>>>>>> >>>>>>