[GitHub] [arrow] josiahyan edited a comment on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
josiahyan edited a comment on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-696381588 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] josiahyan edited a comment on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
josiahyan edited a comment on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-696381588 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] josiahyan edited a comment on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
josiahyan edited a comment on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-696452697 @lidavidm @liyafan82 @jacques-n Interpreting the results: This patch could be improved (performance wise) by more aggressive caching (option 3), at the potential expense of additional state, for an additional 35% speedup above this submitted patch. @liyafan82 @lidavidm I was wrong about the performance gain - I guess this speedup might be worth having? Please let me know if you prefer this variant, subject to the given tradeoff. I'll clean it up and resubmit if so. An append-only interface, trivially implemented, but assuming buffer ownership (Option 2), could get (up to) an additional 54% above the improved version of this patch. A type specialized variant, presumably created by templating out the various types (Option 4), could get (up to, these are very very rough results, it hasn't been verified where this speedup is coming from, it might just be the bit setting code) 18% above the specialized append-only interface. It might be possible to dive into the ASM to figure out what's actually being generated, but that might be a tad extreme for now (and unstable). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] josiahyan edited a comment on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
josiahyan edited a comment on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-696381588 @jacques-n I haven't done very much investigation on other speedups - I just happened to notice performance irregularities as compared to our other (legacy) codepaths, and realized that we were spending a significant number of cycles dividing integers. If you're referring to the linked code, I have nothing to do with the Iceberg project. I just happened to chance upon this code. I haven't investigated further patterns, although I'd want to do that in the future, time permitting. I think it would be possible to measure the upper bound of performance, and address some of the questions (including that) in this discussion, if we did the following benchmarks side-by-side over the given variants: * the original IntBenchmarks * save the value/validity buffer capacities outside arrow code, and then manually resize, calling .set * save the overall value capacity in BaseFixedWidthVector.getValueCapacity, and force recomputation based on valueBuffer.capacity()/validityBuffer.capacity() (what I understand of @liyafan82 's suggestion that avoids invalidation issues around the underlying buffers) * Call the specialized putLong directly (The code that was linked in Iceberg) The second option should highlight the upper bound of performance reachable by aggressively caching. The last option should highlight the upper bound of performance reachable by de-specialization (and therefore removing the virtual call chain). I'll write and run these later today. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] josiahyan edited a comment on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
josiahyan edited a comment on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-695114615 @liyafan82 Thanks for taking a look! > I am a little surprised that JIT failed to transform the integer divistion to a right shift, as it can be easily deduced that the type width for an IntVector is always 4. I was pretty surprised too - it turned out that a huge chunk of cycles that we spent serializing data was devoted to this (BaseFixedWidthVector capacity bounds checking) + ArrowBuf bounds checking. This held true across the two/three JDK versions I used, across servers. It was not immediately evident where all the cycles are going on a Java-level profiler, but if you pull out the assembly, I think its quite clear (see the linked JIRA). > I am thinking about another way: can we simply save the value/validity buffer capacities? so we can further improve the performance by avoiding the if branch in getValueBufferValueCapacity and the if branch in capAtMaxInt I did this for a prototype internally - I think this is faster, but if done before even calling setSafe, implicitly assumes that no-one else is writing to the ArrowBuf. If I remember correctly, if we read the capacities directly and do these checks in getValueBufferValueCapacity, deeper in the setSafe call, performance was marginally comparable (at least within <20%?). However, this needs to be checked again. I wasn't too concerned about this, as we run with arrow.enable_unsafe_memory_access=false` set. We might have been able to make it faster on this front of checking the capacity, but its wy into diminishing returns (for us at least). We also lost a lot of performance over keeping `arrow.enable_unsafe_memory_access=false`, and I was wondering if there was a better way of eliminating all bounds checks, both on the BaseFixedWidthVector, and in ArrowBuf.chk. When we run our production code with `arrow.enable_unsafe_memory_access=false`, the breakdowns of the setSafe CPU usage (before this patch) according to an async sampling profiler are as following: 53% for handleSafe (which this patch is expected to remove the majority of) ~30% for the ArrowBuf.chk checks in the `set(index, value)` code. I was thinking that if we had a specialized vector append API, that guaranteed ownership of the buffer, we could throw out all of these bounds checks safely (in handleSafe, and ArrowBuf.chk), as well as have a restricted API to optimize building buffers out of. Something like the original [IntWriter](https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/complex/writer/BaseWriter.ScalarWriter.html) interface, but with stronger ownership semantics around the ArrowBuf? Is that something you think is possible/desirable? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] josiahyan edited a comment on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
josiahyan edited a comment on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-695114615 @liyafan82 Thanks for taking a look! > I am a little surprised that JIT failed to transform the integer divistion to a right shift, as it can be easily deduced that the type width for an IntVector is always 4. I was pretty surprised too - it turned out that a huge chunk of cycles that we spent serializing data was devoted to this (BaseFixedWidthVector capacity bounds checking) + ArrowBuf bounds checking. This held true across the two/three JDK versions I used, across servers. It was not immediately evident where all the cycles are going on a Java-level profiler, but if you pull out the assembly, I think its quite clear (see the linked JIRA). > I am thinking about another way: can we simply save the value/validity buffer capacities? so we can further improve the performance by avoiding the if branch in getValueBufferValueCapacity and the if branch in capAtMaxInt I did this for a prototype internally - I think this is faster, but if done before even calling setSafe, implicitly assumes that no-one else is writing to the ArrowBuf. If I remember correctly, if we read the capacities directly and do these checks in getValueBufferValueCapacity, deeper in the setSafe call, performance was marginally comparable (at least within <20%?). However, this needs to be checked again. I wasn't too concerned about this, as we run with arrow.enable_unsafe_memory_access=false` set. We might have been able to make it faster on this front of checking the capacity, but its wy into diminishing returns (for us at least). We also lost a lot of performance over keeping `arrow.enable_unsafe_memory_access=false`, and I was wondering if there was a better way of eliminating all bounds checks, both on the BaseFixedWidthVector, and in ArrowBuf.chk. When we run our production code with `arrow.enable_unsafe_memory_access=false`, the breakdowns of the setSafe CPU usage (before this patch) according to an async sampling profiler are as following: 53% for handleSafe (which this patch is expected to remove the majority of) ~30% for the ArrowBuf.chk checks in the `set(index, value)` code. I was thinking that if we had a specialized vector append API, that guaranteed ownership of the buffer, we could throw out all of these bounds checks safely (in handleSafe, and ArrowBuf.chk), as well as have a restricted API to optimize building buffers out of. Something like the original [IntWriter](https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/complex/writer/BaseWriter.ScalarWriter.html) interface, but with stronger ownership semantics around the ArrowBuf? Is that something you think is possible/desirable? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org