[GitHub] [arrow] josiahyan edited a comment on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-22 Thread GitBox


josiahyan edited a comment on pull request #8214:
URL: https://github.com/apache/arrow/pull/8214#issuecomment-696381588







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] josiahyan edited a comment on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-21 Thread GitBox


josiahyan edited a comment on pull request #8214:
URL: https://github.com/apache/arrow/pull/8214#issuecomment-696381588







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] josiahyan edited a comment on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-21 Thread GitBox


josiahyan edited a comment on pull request #8214:
URL: https://github.com/apache/arrow/pull/8214#issuecomment-696452697


   @lidavidm @liyafan82 @jacques-n 
   Interpreting the results:
   This patch could be improved (performance wise) by more aggressive caching 
(option 3), at the potential expense of additional state, for an additional 35% 
speedup above this submitted patch. @liyafan82 @lidavidm I was wrong about the 
performance gain - I guess this speedup might be worth having? Please let me 
know if you prefer this variant, subject to the given tradeoff. I'll clean it 
up and resubmit if so.
   
   An append-only interface, trivially implemented, but assuming buffer 
ownership (Option 2), could get (up to) an additional 54% above the improved 
version of this patch.
   
   A type specialized variant, presumably created by templating out the various 
types (Option 4), could get (up to, these are very very rough results, it 
hasn't been verified where this speedup is coming from, it might just be the 
bit setting code) 18% above the specialized append-only interface.
   
   It might be possible to dive into the ASM to figure out what's actually 
being generated, but that might be a tad extreme for now (and unstable).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] josiahyan edited a comment on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-21 Thread GitBox


josiahyan edited a comment on pull request #8214:
URL: https://github.com/apache/arrow/pull/8214#issuecomment-696381588


   @jacques-n I haven't done very much investigation on other speedups - I just 
happened to notice performance irregularities as compared to our other (legacy) 
codepaths, and realized that we were spending a significant number of cycles 
dividing integers. If you're referring to the linked code, I have nothing to do 
with the Iceberg project. I just happened to chance upon this code.
   
   I haven't investigated further patterns, although I'd want to do that in the 
future, time permitting.
   
   I think it would be possible to measure the upper bound of performance, and 
address some of the questions (including that) in this discussion, if we did 
the following benchmarks side-by-side over the given variants:
   * the original IntBenchmarks
   * save the value/validity buffer capacities outside arrow code, and then 
manually resize, calling .set
   * save the overall value capacity in BaseFixedWidthVector.getValueCapacity, 
and force recomputation based on 
valueBuffer.capacity()/validityBuffer.capacity() (what I understand of 
@liyafan82 's suggestion that avoids invalidation issues around the underlying 
buffers)
   * Call the specialized putLong directly (The code that was linked in 
Iceberg) 
   
   The second option should highlight the upper bound of performance reachable 
by aggressively caching.
   The last option should highlight the upper bound of performance reachable by 
de-specialization (and therefore removing the virtual call chain).
   
   I'll write and run these later today.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] josiahyan edited a comment on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-18 Thread GitBox


josiahyan edited a comment on pull request #8214:
URL: https://github.com/apache/arrow/pull/8214#issuecomment-695114615


   @liyafan82 Thanks for taking a look!
   
   > I am a little surprised that JIT failed to transform the integer divistion 
to a right shift, as it can be easily deduced that the type width for an 
IntVector is always 4.
   
   I was pretty surprised too - it turned out that a huge chunk of cycles that 
we spent serializing data was devoted to this (BaseFixedWidthVector capacity 
bounds checking) + ArrowBuf bounds checking. This held true across the 
two/three JDK versions I used, across servers. It was not immediately evident 
where all the cycles are going on a Java-level profiler, but if you pull out 
the assembly, I think its quite clear (see the linked JIRA).
   
   > I am thinking about another way: can we simply save the value/validity 
buffer capacities? so we can further improve the performance by avoiding the if 
branch in getValueBufferValueCapacity and the if branch in capAtMaxInt
   
   I did this for a prototype internally - I think this is faster, but if done 
before even calling setSafe, implicitly assumes that no-one else is writing to 
the ArrowBuf.
   
   If I remember correctly, if we read the capacities directly and do these 
checks in getValueBufferValueCapacity, deeper in the setSafe call, performance 
was marginally comparable (at least within <20%?). However, this needs to be 
checked again. I wasn't too concerned about this, as we run with 
arrow.enable_unsafe_memory_access=false` set. We might have been able to make 
it faster on this front of checking the capacity, but its wy into 
diminishing returns (for us at least).
   
   We also lost a lot of performance over keeping 
`arrow.enable_unsafe_memory_access=false`, and I was wondering if there was a 
better way of eliminating all bounds checks, both on the BaseFixedWidthVector, 
and in ArrowBuf.chk.
   
   When we run our production code with 
`arrow.enable_unsafe_memory_access=false`, the breakdowns of the setSafe CPU 
usage (before this patch) according to an async sampling profiler are as 
following:
   
   53% for handleSafe (which this patch is expected to remove the majority of)
   ~30% for the ArrowBuf.chk checks in the `set(index, value)` code.
   
   I was thinking that if we had a specialized vector append API, that 
guaranteed ownership of the buffer, we could throw out all of these bounds 
checks safely (in handleSafe, and ArrowBuf.chk), as well as have a restricted 
API to optimize building buffers out of. Something like the original 
[IntWriter](https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/complex/writer/BaseWriter.ScalarWriter.html)
 interface, but with stronger ownership semantics around the ArrowBuf? Is that 
something you think is possible/desirable?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] josiahyan edited a comment on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-18 Thread GitBox


josiahyan edited a comment on pull request #8214:
URL: https://github.com/apache/arrow/pull/8214#issuecomment-695114615


   @liyafan82 Thanks for taking a look!
   
   > I am a little surprised that JIT failed to transform the integer divistion 
to a right shift, as it can be easily deduced that the type width for an 
IntVector is always 4.
   
   I was pretty surprised too - it turned out that a huge chunk of cycles that 
we spent serializing data was devoted to this (BaseFixedWidthVector capacity 
bounds checking) + ArrowBuf bounds checking. This held true across the 
two/three JDK versions I used, across servers. It was not immediately evident 
where all the cycles are going on a Java-level profiler, but if you pull out 
the assembly, I think its quite clear (see the linked JIRA).
   
   > I am thinking about another way: can we simply save the value/validity 
buffer capacities? so we can further improve the performance by avoiding the if 
branch in getValueBufferValueCapacity and the if branch in capAtMaxInt
   I did this for a prototype internally - I think this is faster, but if done 
before even calling setSafe, implicitly assumes that no-one else is writing to 
the ArrowBuf.
   
   If I remember correctly, if we read the capacities directly and do these 
checks in getValueBufferValueCapacity, deeper in the setSafe call, performance 
was marginally comparable (at least within <20%?). However, this needs to be 
checked again. I wasn't too concerned about this, as we run with 
arrow.enable_unsafe_memory_access=false` set. We might have been able to make 
it faster on this front of checking the capacity, but its wy into 
diminishing returns (for us at least).
   
   We also lost a lot of performance over keeping 
`arrow.enable_unsafe_memory_access=false`, and I was wondering if there was a 
better way of eliminating all bounds checks, both on the BaseFixedWidthVector, 
and in ArrowBuf.chk.
   
   When we run our production code with 
`arrow.enable_unsafe_memory_access=false`, the breakdowns of the setSafe CPU 
usage (before this patch) according to an async sampling profiler are as 
following:
   
   53% for handleSafe (which this patch is expected to remove the majority of)
   ~30% for the ArrowBuf.chk checks in the `set(index, value)` code.
   
   I was thinking that if we had a specialized vector append API, that 
guaranteed ownership of the buffer, we could throw out all of these bounds 
checks safely (in handleSafe, and ArrowBuf.chk), as well as have a restricted 
API to optimize building buffers out of. Something like the original 
[IntWriter](https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/complex/writer/BaseWriter.ScalarWriter.html)
 interface, but with stronger ownership semantics around the ArrowBuf? Is that 
something you think is possible/desirable?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org