Re: Possible Fields.indexOf Optimization

JC Mann Tue, 13 Nov 2018 07:23:19 -0800

Lukas,

Your analysis is rock-solid. Thank you.


On Tuesday, November 13, 2018 at 4:17:22 AM UTC-5, Lukas Eder wrote:
>
> Hi JC,
>
> Thank you very much for your message. That's a very interesting question, 
> and I am very happy to review the existing implementation. In fact, 
> incidentally, I have recently noticed indexOf() to be a "hot spot" for some 
> queries at a customer site (where they were projecting huge results with 
> 100s of columns and fetched 10000s of rows). The solution there was to move 
> more logic into SQL and reduce the projection. Not all columns were needed, 
> and by using aggregation and window functions, not all rows were needed 
> either.
>
> Caching is very hard. While a HashMap provides O(1) access complexity 
> (O(N) in rare cases, when hash codes are ill chosen), it still has quite 
> some high "setup" cost:
>
> - hash code calculation
> - equals calculation
> - several heap jumps following the internal data structures to the Entry
>
> In fact, I frequently encounter ill chosen HashMaps as the main bottleneck 
> of some algorithm, such as [1] and [2]
>
> Iterating an array is O(N), but individual array access is much cheaper.
>
> Also, HashMaps consume quite a bit more memory than arrays. Both are O(N), 
> but a HashMap stores several auxiliary values for each key / value. Notice, 
> we don't gain anything by using HashSet, because it's just delegating 
> everything to a HashMap
>
> So, by hypothesis, arrays tend to be better for small results (few 
> columns) whereas HashMaps tend to be better for larger results (many 
> columns).
>
> One thing that's definitely not optimal is how field equality is currently 
> resolved through the various methods. You may have noticed that 
> Fields.indexOf(Field) [3] first delegates to Fields.fields(Field) [4], 
> which does more linear searching. Both are optimised for identity 
> comparisons (which usually applies when using generated code), but we 
> should definitely be able to avoid a few loops by providing more 
> specialised implementations on a per-method case. I have created an issue 
> on GitHub for this [5]
>
> Notice that we cannot use JDK's HashMap for such a cache, because while ...
>
> (field(name("a", "b", "c")) == field(name("c"))) == false
>
> ... we can definitely retrieve a field(name("a", "b", "c)) from a record 
> by querying for field(name("c")) for a variety of convenience reasons. So, 
> what we would need is a HashMap with an external hashCode() and equals() 
> provider, such as the commons collections AbstractHashedMap:
> https://stackoverflow.com/a/20030782/521799
>
> I must admit, I have never benchmarked that AbstractHashedMap option, as 
> it hasn't occurred to me as an option until right now. I will definitely 
> benchmark this and post results to GitHub [5] and this list.
>
> Regarding your questions and assuming a cache would actually provide 
> significant help:
>
> 1. Should it be used for every type of Result? I don't think so. For 
>> example, I don't think it makes sense to apply an optimization of a Result 
>> which contains only 1 column and 1 row.
>
>
> Exactly. We would have to find the sweet spot. Let's say (random guess) 
> with 16 columns or more. There's no distinction between 1 row results and 
> 100000 row results. The number of columns is what makes a difference 
> because...
>
> 2. Where should this cache live? It would make sense to store it on the 
>> Result so it is optimized for each row, but is there a better place? What 
>> if it were stored with the query?
>
>
> It would live in org.jooq.impl.Fields, which is already a shared instance 
> for queries, results, and records. Its main purpose is to abstract over any 
> kind of org.jooq.RecordType access in all of jOOQ's internals.
>
> Notice that in the first versions of jOOQ, a Record was really a Map (i.e. 
> it contained a HashMap). That was a major source of allocation trouble, 
> which is why we now have that array inside of the record.
>
> Thanks again for bringing this up. I will definitely follow up in this 
> area. If you find anything additional that is noteworthy, please do post it 
> here. I'm very happy to discuss this.
>
> [1]: https://bugs.openjdk.java.net/browse/JDK-8213243
> [2]: https://bugs.openjdk.java.net/browse/JDK-8213280
> [3]: 
> https://github.com/jOOQ/jOOQ/blob/version-3.11.7/jOOQ/src/main/java/org/jooq/impl/Fields.java#L267
> [4]: 
> https://github.com/jOOQ/jOOQ/blob/version-3.11.7/jOOQ/src/main/java/org/jooq/impl/Fields.java#L86
> [5]: https://github.com/jOOQ/jOOQ/issues/8040
>
>  
>

-- 
You received this message because you are subscribed to the Google Groups "jOOQ 
User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Possible Fields.indexOf Optimization

Reply via email to