On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield <emkornfi...@gmail.com> wrote:
> The vector indexes being limited to 32 bits doesn't limit the addressing >> to 32 bit chunks of memory. For example, you're prime example before was >> image data. Having 2 billion images of 1mb images would still be supported >> without changing the index addressing. > > This might be pre-coffee math, but I think we are limited to approximately > 2000 images because an ArrowBuf only holds up-to 2 billion bytes [1]. > While we have plenty of room for the offsets, we quickly run out of > contiguous data storage space. For LargeString and LargeBinary this could > be fixed by changing ArrowBuf. > > LargeArray faces the same problem only it applies to its child vectors. > Supporting LargeArray properly is really what drove the large-scale > interface change. > My expressed concern about these changes was specifically about the use of long for get/set in the vector interfaces. I'm not saying that we constrain memory/ArrowBufs to 32bits. > Rebase would help if possible. > > I'll try to get to this in the next few days. > > [1] > https://github.com/apache/arrow/blob/95175fe7cb8439eebe6d2f6e0495f551d6864380/java/memory/src/main/java/io/netty/buffer/ArrowBuf.java#L164 > > On Fri, Aug 23, 2019 at 4:55 AM Jacques Nadeau <jacq...@apache.org> wrote: > >> >> >> On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> >>> I don't think we should couple this discussion with the implementation >>>> of large list, etc since I think those two concepts are independent. >>> >>> I'm still trying to balance in my mind which is a worse experience for >>> consumers of the libraries for these types. Claiming that Java supports >>> these types but throwing an exception when the Vectors exceed 32-bits or >>> just say they aren't supported until we have 64-bit support in Java. >>> >> >> The vector indexes being limited to 32 bits doesn't limit the addressing >> to 32 bit chunks of memory. For example, you're prime example before was >> image data. Having 2 billion images of 1mb images would still be supported >> without changing the index addressing. >> >> >> >>> >>>> I've asked some others on my team their opinions on the risk here. I >>>> think we should probably review some our more complex vector interactions >>>> and see how the jvm's assembly changes with this kind of change. Using >>>> microbenchmarking is good but I think we also need to see whether we're >>>> constantly inserting additional instructions or if in most cases, this >>>> actually doesn't impact instruction count. >>> >>> >>> Is this something that your team will take on? >>> >> >> >> Yeah, we need to look at this I think. >> >> Do you need a rebased version of the PR or is the existing one sufficient? >>> >> >> Rebase would help if possible. >> >> >> >>> Thanks, >>> Micah >>> >>> On Thu, Aug 22, 2019 at 8:56 PM Jacques Nadeau <jacq...@apache.org> >>> wrote: >>> >>>> I don't think we should couple this discussion with the implementation >>>> of large list, etc since I think those two concepts are independent. >>>> >>>> I've asked some others on my team their opinions on the risk here. I >>>> think we should probably review some our more complex vector interactions >>>> and see how the jvm's assembly changes with this kind of change. Using >>>> microbenchmarking is good but I think we also need to see whether we're >>>> constantly inserting additional instructions or if in most cases, this >>>> actually doesn't impact instruction count. >>>> >>>> >>>> >>>> On Wed, Aug 21, 2019 at 12:18 PM Micah Kornfield <emkornfi...@gmail.com> >>>> wrote: >>>> >>>>> >>>>>> With regards to the reference implementation point. It is a good >>>>>> point. I'm on vacation this week. Unless you're pushing hard on this, can >>>>>> we pick this up and discuss more next week? >>>>> >>>>> >>>>> Hi Jacques, I hope you had a good rest. Any more thoughts on the >>>>> reference implementation aspect of this? >>>>> >>>>> >>>>>> To copy the sentiments from the 0.15.0 release thread, I think it >>>>>> would be best to decouple this discussion from the release timeline >>>>>> given how many people we have relying on regular releases coming out. >>>>>> We can keep continue making major 0.x releases until we're ready to >>>>>> release 1.0.0. >>>>> >>>>> >>>>> I'm OK with it as long as other stakeholders are. Timed releases are >>>>> the way to go. As stated on the release thread [1] we need a better >>>>> mechanism to avoid this type of issue arising again. The release thread >>>>> also had some more discussion on compatibility. >>>>> >>>>> Thanks, >>>>> Micah >>>>> >>>>> [1] >>>>> https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E >>>>> >>>>> >>>>> On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <wesmck...@gmail.com> >>>>> wrote: >>>>> >>>>>> On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield < >>>>>> emkornfi...@gmail.com> wrote: >>>>>> > >>>>>> > Hi Wes and Jacques, >>>>>> > See responses below. >>>>>> > >>>>>> > With regards to the reference implementation point. It is a good >>>>>> point. I'm >>>>>> > > on vacation this week. Unless you're pushing hard on this, can we >>>>>> pick this >>>>>> > > up and discuss more next week? >>>>>> > >>>>>> > >>>>>> > Sure thing, enjoy your vacation. I think the only practical >>>>>> implications >>>>>> > are it delays choices around implementing LargeList, LargeBinary, >>>>>> > LargeString in Java, which in turn might push out the 0.15.0 >>>>>> release. >>>>>> > >>>>>> >>>>>> To copy the sentiments from the 0.15.0 release thread, I think it >>>>>> would be best to decouple this discussion from the release timeline >>>>>> given how many people we have relying on regular releases coming out. >>>>>> We can keep continue making major 0.x releases until we're ready to >>>>>> release 1.0.0. >>>>>> >>>>>> > My stance on this is that I don't know how important it is for Java >>>>>> to >>>>>> > > support vectors over INT32_MAX elements. The use cases enabled by >>>>>> > > having very large arrays seem to be concentrated in the native >>>>>> code >>>>>> > > world (e.g. C/C++/Rust) -- that could just be >>>>>> implementation-centrism >>>>>> > > on my part, though. >>>>>> > >>>>>> > >>>>>> > A data point against this view is Spark has done work to eliminate >>>>>> 2GB >>>>>> > memory limits on its block sizes [1]. I don't claim to understand >>>>>> the >>>>>> > implications of this. Bryan might you have any thoughts here? I'm >>>>>> OK with >>>>>> > INT32_MAX, as well, I think we should think about what this means >>>>>> for >>>>>> > adding Large types to Java and implications for reference >>>>>> implementations. >>>>>> > >>>>>> > Thanks, >>>>>> > Micah >>>>>> > >>>>>> > [1] https://issues.apache.org/jira/browse/SPARK-6235 >>>>>> > >>>>>> > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau <jacq...@apache.org> >>>>>> wrote: >>>>>> > >>>>>> > > Hey Micah, >>>>>> > > >>>>>> > > Appreciate the offer on the compiling. The reality is I'm more >>>>>> concerned >>>>>> > > about the unknowns than the compiling issue itself. Any time >>>>>> you've been >>>>>> > > tuning for a while, changing something like this could be totally >>>>>> fine or >>>>>> > > cause a couple of major issues. For example, we've done a very >>>>>> large amount >>>>>> > > of work reducing heap memory footprint of the vectors. Are target >>>>>> is to >>>>>> > > actually get it down to 24 bytes per ArrowBuf and 24 bytes heap >>>>>> per vector >>>>>> > > (not including arrow bufs). >>>>>> > > >>>>>> > > With regards to the reference implementation point. It is a good >>>>>> point. >>>>>> > > I'm on vacation this week. Unless you're pushing hard on this, >>>>>> can we pick >>>>>> > > this up and discuss more next week? >>>>>> > > >>>>>> > > thanks, >>>>>> > > Jacques >>>>>> > > >>>>>> > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield < >>>>>> emkornfi...@gmail.com> >>>>>> > > wrote: >>>>>> > > >>>>>> > >> Hi Jacques, >>>>>> > >> I definitely understand these concerns and this change is risky >>>>>> because it >>>>>> > >> is so large. Perhaps, creating a new hierarchy, might be the >>>>>> cleanest way >>>>>> > >> of dealing with this. This could have other benefits like >>>>>> cleaning up >>>>>> > >> some >>>>>> > >> cruft around dictionary encode and "orphaned" method. Per past >>>>>> e-mail >>>>>> > >> threads I agree it is beneficial to have 2 separate reference >>>>>> > >> implementations that can communicate fully, and my intent here >>>>>> was to >>>>>> > >> close >>>>>> > >> that gap. >>>>>> > >> >>>>>> > >> Trying to >>>>>> > >> > determine the ramifications of these changes would be >>>>>> challenging and >>>>>> > >> time >>>>>> > >> > consuming against all the different ways we interact with the >>>>>> Arrow Java >>>>>> > >> > library. >>>>>> > >> >>>>>> > >> >>>>>> > >> Understood. I took a quick look at Dremio-OSS it seems like it >>>>>> has a >>>>>> > >> simple java build system? If it is helpful, I can try to get a >>>>>> fork >>>>>> > >> running that at least compiles against this PR. My plan would >>>>>> be to cast >>>>>> > >> any place that was changed to return a long back to an int, so >>>>>> in essence >>>>>> > >> the Dremio algorithms would reman 32-bit implementations. >>>>>> > >> >>>>>> > >> I don't have the infrastructure to test this change properly >>>>>> from a >>>>>> > >> distributed systems perspective, so it would still take some >>>>>> time from >>>>>> > >> Dremio to validate for regressions. >>>>>> > >> >>>>>> > >> I'm not saying I'm against this but want to make sure we've >>>>>> > >> > explored all less disruptive options before considering >>>>>> changing >>>>>> > >> something >>>>>> > >> > this fundamental (especially when I generally hold the view >>>>>> that large >>>>>> > >> cell >>>>>> > >> > counts against massive contiguous memory is an anti pattern to >>>>>> scalable >>>>>> > >> > analytical processing--purely subjective of course). >>>>>> > >> >>>>>> > >> >>>>>> > >> I'm open to other ideas here, as well. I don't think it is out >>>>>> of the >>>>>> > >> question to leave the Java implementation as 32-bit, but if we >>>>>> do, then I >>>>>> > >> think we should consider a different strategy for reference >>>>>> > >> implementations. >>>>>> > >> >>>>>> > >> Thanks, >>>>>> > >> Micah >>>>>> > >> >>>>>> > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau < >>>>>> jacq...@apache.org> >>>>>> > >> wrote: >>>>>> > >> >>>>>> > >> > Hey Micah, I didn't have a particular path in mind. Was >>>>>> thinking more >>>>>> > >> along >>>>>> > >> > the lines of extra methods as opposed to separate classes. >>>>>> > >> > >>>>>> > >> > Arrow hasn't historically been a place where we're writing >>>>>> algorithms in >>>>>> > >> > Java so the fact that they aren't there doesn't mean they >>>>>> don't exist. >>>>>> > >> We >>>>>> > >> > have a large amount of code that depends on the current >>>>>> behavior that is >>>>>> > >> > deployed in hundreds of customer clusters (you can peruse our >>>>>> dremio >>>>>> > >> repo >>>>>> > >> > to see how extensively we leverage Arrow if interested). >>>>>> Trying to >>>>>> > >> > determine the ramifications of these changes would be >>>>>> challenging and >>>>>> > >> time >>>>>> > >> > consuming against all the different ways we interact with the >>>>>> Arrow Java >>>>>> > >> > library. I'm not saying I'm against this but want to make sure >>>>>> we've >>>>>> > >> > explored all less disruptive options before considering >>>>>> changing >>>>>> > >> something >>>>>> > >> > this fundamental (especially when I generally hold the view >>>>>> that large >>>>>> > >> cell >>>>>> > >> > counts against massive contiguous memory is an anti pattern to >>>>>> scalable >>>>>> > >> > analytical processing--purely subjective of course). >>>>>> > >> > >>>>>> > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield < >>>>>> emkornfi...@gmail.com> >>>>>> > >> > wrote: >>>>>> > >> > >>>>>> > >> > > Hi Jacques, >>>>>> > >> > > What avenue were you thinking for supporting both paths? I >>>>>> didn't >>>>>> > >> want >>>>>> > >> > > to pursue a different class hierarchy, because I felt like >>>>>> that would >>>>>> > >> > > effectively fork the code base, but that is potentially an >>>>>> option that >>>>>> > >> > > would allow us to have a complete reference implementation >>>>>> in Java >>>>>> > >> that >>>>>> > >> > can >>>>>> > >> > > fully interact with C++, without major changes to this code. >>>>>> > >> > > >>>>>> > >> > > For supporting both APIs on the same classes/interfaces, I >>>>>> think they >>>>>> > >> > > roughly fall into three categories, changes to input >>>>>> parameters, >>>>>> > >> changes >>>>>> > >> > to >>>>>> > >> > > output parameters and algorithm changes. >>>>>> > >> > > >>>>>> > >> > > For inputs, changing from int to long is essentially a no-op >>>>>> from the >>>>>> > >> > > compiler perspective. From the limited micro-benchmarking >>>>>> this also >>>>>> > >> > > doesn't seem to have a performance impact. So we could keep >>>>>> two >>>>>> > >> versions >>>>>> > >> > > of the methods that only differ on inputs, but it is not >>>>>> clear what >>>>>> > >> the >>>>>> > >> > > value of that would be. >>>>>> > >> > > >>>>>> > >> > > For outputs, we can't support methods "long getLength()" and >>>>>> "int >>>>>> > >> > > getLength()" in the same class, so we would be forced into >>>>>> something >>>>>> > >> like >>>>>> > >> > > "long getLength(boolean dummy)" which I think is a less >>>>>> desirable. >>>>>> > >> > > >>>>>> > >> > > For algorithm changes, there did not appear to be too many >>>>>> places >>>>>> > >> where >>>>>> > >> > we >>>>>> > >> > > actually loop over all elements (it is quite possible I >>>>>> missed >>>>>> > >> something >>>>>> > >> > > here), the ones that I did find I was able to mitigate >>>>>> performance >>>>>> > >> > > penalties as noted above. Some of the current >>>>>> implementation will >>>>>> > >> get a >>>>>> > >> > > lot slower for "large arrays", but we can likely fix those >>>>>> later or in >>>>>> > >> > this >>>>>> > >> > > PR with a nested while loop instead of 2 for loops. >>>>>> > >> > > >>>>>> > >> > > Thanks, >>>>>> > >> > > Micah >>>>>> > >> > > >>>>>> > >> > > >>>>>> > >> > > On Saturday, August 10, 2019, Jacques Nadeau < >>>>>> jacq...@apache.org> >>>>>> > >> wrote: >>>>>> > >> > > >>>>>> > >> > >> This is a pretty massive change to the apis. I wonder how >>>>>> nasty it >>>>>> > >> would >>>>>> > >> > >> be to just support both paths. Have you evaluated how >>>>>> complex that >>>>>> > >> > would be? >>>>>> > >> > >> >>>>>> > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield < >>>>>> > >> emkornfi...@gmail.com> >>>>>> > >> > >> wrote: >>>>>> > >> > >> >>>>>> > >> > >>> After more investigation, it looks like Float8Benchmarks >>>>>> at least >>>>>> > >> on my >>>>>> > >> > >>> machine are within the range of noise. >>>>>> > >> > >>> >>>>>> > >> > >>> For BitVectorHelper I pushed a new commit [1], seems to >>>>>> bring the >>>>>> > >> > >>> BitVectorHelper benchmarks back inline (and even with some >>>>>> > >> improvement >>>>>> > >> > >>> for >>>>>> > >> > >>> getNullCountBenchmark). >>>>>> > >> > >>> >>>>>> > >> > >>> Benchmark Mode >>>>>> Cnt Score >>>>>> > >> > >>> Error >>>>>> > >> > >>> Units >>>>>> > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark avgt >>>>>> 5 3.821 ± >>>>>> > >> > >>> 0.031 >>>>>> > >> > >>> ns/op >>>>>> > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark avgt >>>>>> 5 14.884 ± >>>>>> > >> > >>> 0.141 >>>>>> > >> > >>> ns/op >>>>>> > >> > >>> >>>>>> > >> > >>> I applied the same pattern to other loops that I could >>>>>> find, and for >>>>>> > >> > any >>>>>> > >> > >>> "for (long" loop on the critical path, I broke it up into >>>>>> two loops. >>>>>> > >> > the >>>>>> > >> > >>> first loop does iteration by integer, the second finishes >>>>>> off for >>>>>> > >> any >>>>>> > >> > >>> long >>>>>> > >> > >>> values. As a side note it seems like optimization for >>>>>> loops using >>>>>> > >> long >>>>>> > >> > >>> counters at least have a semi-recent open bug for the JVM >>>>>> [2] >>>>>> > >> > >>> >>>>>> > >> > >>> Thanks, >>>>>> > >> > >>> Micah >>>>>> > >> > >>> >>>>>> > >> > >>> [1] >>>>>> > >> > >>> >>>>>> > >> > >>> >>>>>> > >> > >>>>>> > >> >>>>>> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797 >>>>>> > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051 >>>>>> > >> > >>> >>>>>> > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield < >>>>>> > >> emkornfi...@gmail.com> >>>>>> > >> > >>> wrote: >>>>>> > >> > >>> >>>>>> > >> > >>> > Indeed, the BoundChecking and CheckNullForGet variables >>>>>> can make a >>>>>> > >> > big >>>>>> > >> > >>> > difference. I didn't initially run the benchmarks with >>>>>> these >>>>>> > >> turned >>>>>> > >> > on >>>>>> > >> > >>> > (you can see the result from above with >>>>>> Float8Benchmarks). Here >>>>>> > >> are >>>>>> > >> > >>> new >>>>>> > >> > >>> > numbers including with the flags enabled. It looks like >>>>>> using >>>>>> > >> longs >>>>>> > >> > >>> might >>>>>> > >> > >>> > be a little bit slower, I'll see what I can do to >>>>>> mitigate this. >>>>>> > >> > >>> > >>>>>> > >> > >>> > Ravindra also volunteered to try to benchmark the >>>>>> changes with >>>>>> > >> > Dremio's >>>>>> > >> > >>> > code on today's sync call. >>>>>> > >> > >>> > >>>>>> > >> > >>> > New >>>>>> > >> > >>> > >>>>>> > >> > >>> > Benchmark Mode >>>>>> Cnt Score >>>>>> > >> > >>> Error >>>>>> > >> > >>> > Units >>>>>> > >> > >>> > >>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark avgt >>>>>> 5 >>>>>> > >> 4.176 ± >>>>>> > >> > >>> 1.292 >>>>>> > >> > >>> > ns/op >>>>>> > >> > >>> > >>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark avgt >>>>>> 5 >>>>>> > >> 26.102 ± >>>>>> > >> > >>> 0.700 >>>>>> > >> > >>> > ns/op >>>>>> > >> > >>> > >>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark avgt 5 7.398 ± >>>>>> 0.084 >>>>>> > >> us/op >>>>>> > >> > >>> > >>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark avgt 5 2.711 ± >>>>>> 0.057 >>>>>> > >> us/op >>>>>> > >> > >>> > >>>>>> > >> > >>> > >>>>>> > >> > >>> > >>>>>> > >> > >>> > old >>>>>> > >> > >>> > >>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark avgt >>>>>> 5 >>>>>> > >> 3.828 ± >>>>>> > >> > >>> 0.030 >>>>>> > >> > >>> > ns/op >>>>>> > >> > >>> > >>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark avgt >>>>>> 5 >>>>>> > >> 20.611 ± >>>>>> > >> > >>> 0.188 >>>>>> > >> > >>> > ns/op >>>>>> > >> > >>> > >>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark avgt 5 6.597 ± >>>>>> 0.462 >>>>>> > >> us/op >>>>>> > >> > >>> > >>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark avgt 5 2.615 ± >>>>>> 0.027 >>>>>> > >> us/op >>>>>> > >> > >>> > >>>>>> > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya < >>>>>> liya.fa...@gmail.com> >>>>>> > >> > wrote: >>>>>> > >> > >>> > >>>>>> > >> > >>> >> Hi Gonzalo, >>>>>> > >> > >>> >> >>>>>> > >> > >>> >> Thanks for sharing the performance results. >>>>>> > >> > >>> >> I am wondering if you have turned off the flag >>>>>> > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED. >>>>>> > >> > >>> >> If not, the lower throughput should be expected. >>>>>> > >> > >>> >> >>>>>> > >> > >>> >> Best, >>>>>> > >> > >>> >> Liya Fan >>>>>> > >> > >>> >> >>>>>> > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield < >>>>>> > >> > >>> emkornfi...@gmail.com> >>>>>> > >> > >>> >> wrote: >>>>>> > >> > >>> >> >>>>>> > >> > >>> >>> Hi Gonzalo, >>>>>> > >> > >>> >>> Thank you for the feedback. I wasn't aware of the JIT >>>>>> > >> > >>> implications. At >>>>>> > >> > >>> >>> least on the benchmark run they don't seem to have an >>>>>> impact. >>>>>> > >> > >>> >>> >>>>>> > >> > >>> >>> If there are other benchmarks that people have that can >>>>>> > >> validate if >>>>>> > >> > >>> this >>>>>> > >> > >>> >>> change will be problematic I would appreciate trying >>>>>> to run them >>>>>> > >> > >>> with the >>>>>> > >> > >>> >>> PR. I will try to run the ones for zeroing/popcnt >>>>>> tonight to >>>>>> > >> see >>>>>> > >> > if >>>>>> > >> > >>> >>> there >>>>>> > >> > >>> >>> is a change in those. >>>>>> > >> > >>> >>> >>>>>> > >> > >>> >>> -Micah >>>>>> > >> > >>> >>> >>>>>> > >> > >>> >>> >>>>>> > >> > >>> >>> >>>>>> > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz >>>>>> Jaureguizar < >>>>>> > >> > >>> >>> golthir...@gmail.com> wrote: >>>>>> > >> > >>> >>> >>>>>> > >> > >>> >>> > I would recommend to take care with this kind of >>>>>> changes. >>>>>> > >> > >>> >>> > >>>>>> > >> > >>> >>> > I didn't try Arrow in more than one year, but by >>>>>> then the >>>>>> > >> > >>> performance >>>>>> > >> > >>> >>> was >>>>>> > >> > >>> >>> > quite bad in comparison with plain byte buffer access >>>>>> > >> > >>> >>> > (see >>>>>> http://git.net/apache-arrow-development/msg02353.html *) >>>>>> > >> > and >>>>>> > >> > >>> >>> > there are several optimizations that the JVM >>>>>> (specifically, >>>>>> > >> C2) >>>>>> > >> > >>> does >>>>>> > >> > >>> >>> not >>>>>> > >> > >>> >>> > apply when dealing with int instead of longs. One of >>>>>> the >>>>>> > >> > >>> >>> > most commons is the loop unrolling and vectorization. >>>>>> > >> > >>> >>> > >>>>>> > >> > >>> >>> > * It doesn't seem the best way to reference an old >>>>>> email on >>>>>> > >> the >>>>>> > >> > >>> list, >>>>>> > >> > >>> >>> but >>>>>> > >> > >>> >>> > it is the only result shown by Google >>>>>> > >> > >>> >>> > >>>>>> > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (< >>>>>> > >> > liya.fa...@gmail.com >>>>>> > >> > >>> >) >>>>>> > >> > >>> >>> > escribió: >>>>>> > >> > >>> >>> > >>>>>> > >> > >>> >>> >> Hi Micah, >>>>>> > >> > >>> >>> >> >>>>>> > >> > >>> >>> >> Thanks for your effort. The performance result >>>>>> looks good. >>>>>> > >> > >>> >>> >> >>>>>> > >> > >>> >>> >> As you indicated, ArrowBuf will take additional 12 >>>>>> bytes (4 >>>>>> > >> > bytes >>>>>> > >> > >>> for >>>>>> > >> > >>> >>> each >>>>>> > >> > >>> >>> >> of length, write index, and read index). >>>>>> > >> > >>> >>> >> Similar overheads also exist for vectors like >>>>>> > >> > >>> BaseFixedWidthVector, >>>>>> > >> > >>> >>> >> BaseVariableWidthVector, etc. >>>>>> > >> > >>> >>> >> >>>>>> > >> > >>> >>> >> IMO, such overheads are small enough to justify the >>>>>> change. >>>>>> > >> > >>> >>> >> Let's check if there are other overheads. >>>>>> > >> > >>> >>> >> >>>>>> > >> > >>> >>> >> Best, >>>>>> > >> > >>> >>> >> Liya Fan >>>>>> > >> > >>> >>> >> >>>>>> > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield < >>>>>> > >> > >>> emkornfi...@gmail.com >>>>>> > >> > >>> >>> > >>>>>> > >> > >>> >>> >> wrote: >>>>>> > >> > >>> >>> >> >>>>>> > >> > >>> >>> >> > Hi Liya Fan, >>>>>> > >> > >>> >>> >> > Based on the Float8Benchmark there does not seem >>>>>> to be any >>>>>> > >> > >>> >>> meaningful >>>>>> > >> > >>> >>> >> > performance difference on my machine. At least >>>>>> for me, the >>>>>> > >> > >>> >>> benchmarks >>>>>> > >> > >>> >>> >> are >>>>>> > >> > >>> >>> >> > not stable enough to say one is faster than the >>>>>> other (I've >>>>>> > >> > >>> pasted >>>>>> > >> > >>> >>> >> results >>>>>> > >> > >>> >>> >> > below). That being said my machine isn't >>>>>> necessarily the >>>>>> > >> most >>>>>> > >> > >>> >>> reliable >>>>>> > >> > >>> >>> >> for >>>>>> > >> > >>> >>> >> > benchmarking. >>>>>> > >> > >>> >>> >> > >>>>>> > >> > >>> >>> >> > On an intuitive level, this makes sense to me, >>>>>> for the >>>>>> > >> most >>>>>> > >> > >>> part it >>>>>> > >> > >>> >>> >> seems >>>>>> > >> > >>> >>> >> > like the change just moves casting from "int" to >>>>>> "long" >>>>>> > >> > further >>>>>> > >> > >>> up >>>>>> > >> > >>> >>> the >>>>>> > >> > >>> >>> >> > stack for "PlatformDepdendent" operations. If >>>>>> there are >>>>>> > >> > other >>>>>> > >> > >>> >>> >> benchmarks >>>>>> > >> > >>> >>> >> > that you think are worth running let me know. >>>>>> > >> > >>> >>> >> > >>>>>> > >> > >>> >>> >> > One downside performance wise I think for his >>>>>> change is it >>>>>> > >> > >>> >>> increases the >>>>>> > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose could >>>>>> influence >>>>>> > >> > cache >>>>>> > >> > >>> >>> misses >>>>>> > >> > >>> >>> >> at >>>>>> > >> > >>> >>> >> > some level or increase the size of call-stacks, >>>>>> but this >>>>>> > >> > doesn't >>>>>> > >> > >>> >>> seem to >>>>>> > >> > >>> >>> >> > show up in the benchmark.. >>>>>> > >> > >>> >>> >> > >>>>>> > >> > >>> >>> >> > Thanks, >>>>>> > >> > >>> >>> >> > Micah >>>>>> > >> > >>> >>> >> > >>>>>> > >> > >>> >>> >> > Sample benchmark numbers: >>>>>> > >> > >>> >>> >> > >>>>>> > >> > >>> >>> >> > [New Code] >>>>>> > >> > >>> >>> >> > Benchmark Mode Cnt >>>>>> Score >>>>>> > >> Error >>>>>> > >> > >>> >>> Units >>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark avgt 5 >>>>>> 15.441 ± >>>>>> > >> 0.469 >>>>>> > >> > >>> >>> us/op >>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark avgt 5 >>>>>> 14.057 ± >>>>>> > >> 0.115 >>>>>> > >> > >>> >>> us/op >>>>>> > >> > >>> >>> >> > >>>>>> > >> > >>> >>> >> > >>>>>> > >> > >>> >>> >> > [Old code] >>>>>> > >> > >>> >>> >> > Benchmark Mode Cnt >>>>>> Score >>>>>> > >> Error >>>>>> > >> > >>> >>> Units >>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark avgt 5 >>>>>> 16.248 ± >>>>>> > >> 1.409 >>>>>> > >> > >>> >>> us/op >>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark avgt 5 >>>>>> 14.150 ± >>>>>> > >> 0.084 >>>>>> > >> > >>> >>> us/op >>>>>> > >> > >>> >>> >> > >>>>>> > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya < >>>>>> > >> liya.fa...@gmail.com >>>>>> > >> > > >>>>>> > >> > >>> >>> wrote: >>>>>> > >> > >>> >>> >> > >>>>>> > >> > >>> >>> >> >> Hi Micah, >>>>>> > >> > >>> >>> >> >> >>>>>> > >> > >>> >>> >> >> Thanks a lot for doing this. >>>>>> > >> > >>> >>> >> >> >>>>>> > >> > >>> >>> >> >> I am a little concerned about if there is any >>>>>> negative >>>>>> > >> > >>> performance >>>>>> > >> > >>> >>> >> impact >>>>>> > >> > >>> >>> >> >> on the current 32-bit-length based applications. >>>>>> > >> > >>> >>> >> >> Can we do some performance comparison on our >>>>>> existing >>>>>> > >> > >>> benchmarks? >>>>>> > >> > >>> >>> >> >> >>>>>> > >> > >>> >>> >> >> Best, >>>>>> > >> > >>> >>> >> >> Liya Fan >>>>>> > >> > >>> >>> >> >> >>>>>> > >> > >>> >>> >> >> >>>>>> > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield < >>>>>> > >> > >>> >>> emkornfi...@gmail.com> >>>>>> > >> > >>> >>> >> >> wrote: >>>>>> > >> > >>> >>> >> >> >>>>>> > >> > >>> >>> >> >>> There have been some previous discussions on >>>>>> the mailing >>>>>> > >> > about >>>>>> > >> > >>> >>> >> supporting >>>>>> > >> > >>> >>> >> >>> 64-bit lengths for Java ValueVectors (this is >>>>>> what the >>>>>> > >> IPC >>>>>> > >> > >>> >>> >> specification >>>>>> > >> > >>> >>> >> >>> and C++ support). I created a PR [1] that >>>>>> changes all >>>>>> > >> APIs >>>>>> > >> > >>> that I >>>>>> > >> > >>> >>> >> could >>>>>> > >> > >>> >>> >> >>> find that take an index to take an "long" >>>>>> instead of an >>>>>> > >> > "int" >>>>>> > >> > >>> (and >>>>>> > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs). >>>>>> > >> > >>> >>> >> >>> >>>>>> > >> > >>> >>> >> >>> It is a big change, so I think it is worth >>>>>> discussing if >>>>>> > >> it >>>>>> > >> > is >>>>>> > >> > >>> >>> >> something >>>>>> > >> > >>> >>> >> >>> we >>>>>> > >> > >>> >>> >> >>> still want to move forward with. It would be >>>>>> nice to >>>>>> > >> come >>>>>> > >> > to >>>>>> > >> > >>> a >>>>>> > >> > >>> >>> >> >>> conclusion >>>>>> > >> > >>> >>> >> >>> quickly, ideally in the next few days, to avoid >>>>>> a lot of >>>>>> > >> > merge >>>>>> > >> > >>> >>> >> conflicts. >>>>>> > >> > >>> >>> >> >>> >>>>>> > >> > >>> >>> >> >>> The reason I did this work now is the C++ >>>>>> implementation >>>>>> > >> has >>>>>> > >> > >>> added >>>>>> > >> > >>> >>> >> >>> support >>>>>> > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString >>>>>> arrays and >>>>>> > >> based >>>>>> > >> > on >>>>>> > >> > >>> >>> prior >>>>>> > >> > >>> >>> >> >>> discussions we need to have similar support in >>>>>> Java >>>>>> > >> before >>>>>> > >> > our >>>>>> > >> > >>> >>> next >>>>>> > >> > >>> >>> >> >>> release. Support 64-bit indexes means we can >>>>>> have full >>>>>> > >> > >>> >>> compatibility >>>>>> > >> > >>> >>> >> and >>>>>> > >> > >>> >>> >> >>> make the most use of the types in Java. >>>>>> > >> > >>> >>> >> >>> >>>>>> > >> > >>> >>> >> >>> Look forward to hearing feedback. >>>>>> > >> > >>> >>> >> >>> >>>>>> > >> > >>> >>> >> >>> Thanks, >>>>>> > >> > >>> >>> >> >>> Micah >>>>>> > >> > >>> >>> >> >>> >>>>>> > >> > >>> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020 >>>>>> > >> > >>> >>> >> >>> >>>>>> > >> > >>> >>> >> >> >>>>>> > >> > >>> >>> >> >>>>>> > >> > >>> >>> > >>>>>> > >> > >>> >>> >>>>>> > >> > >>> >> >>>>>> > >> > >>> >>>>>> > >> > >> >>>>>> > >> > >>>>>> > >> >>>>>> > > >>>>>> >>>>>