[ https://issues.apache.org/jira/browse/SOLR-16515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Torsten Bøgh Köster updated SOLR-16515: --------------------------------------- Description: The {{SlowCompositeReaderWrapper}} uses synchronized read and write access to its internal {{cachedOrdMaps}} . By using a {{ConcurrentHashMap}} instead of a {{LinkedHashMap}} as the underlying {{cachedOrdMaps}} implementation and the {{ConcurrentHashMap#computeIfAbsent}} method to compute cache values, we were able to reduce locking contention significantly. h3. Background Under heavy load we discovered that application halts inside of Solr are becoming a serious problem in high traffic environments. Using Java Flight Recordings we discovered high accumulated applications halts on the {{cachedOrdMaps}} in {{SlowCompositeReaderWrapper}} . Without this fix we were able to utilize our machines only up to 25% cpu usage. With the fix applied, a utilization up to 80% is perfectly doable. h3. Description Our Solr instances utilizes the {{collapse}} component heavily. The instances run with 32 cores and 32gb Java heap on a rather small index (4gb). The instances scale out at 50% cpu load. We take Java Flight Recorder snapshots of 60 seconds as soon the cpu usage exceeds 50%. !slow-composite-reader-wrapper-before.jpg|height=1024px! During our 60s Java Flight Recorder snapshot, the ~2k Jetty threads accumulated more than 16h locking time inside the {{SlowCompositeReaderWrapper}} (see screenshot). With this fix applied, the locking access is reduced to cache write accesses only. We validated this using another JFR snapshot: !slow-composite-reader-wrapper-after.jpg|height=1024px! h3. Solution We propose the following improvement inside the {{SlowCompositeReaderWrapper}} removing blocking {{synchronized}} access to the internal {{cachedOrdMaps}} . The implementation keeps the semantics of the {{getSortedDocValues}} and {{getSortedSetDocValues}} methods but moves the expensive part of {{OrdinalMap#build}} into a producer. We use the producer to access the {{ConcurrentHashMap}} using the {{ConcurrentHashMap#computeIfAbsent}} method only. The current implementation uses the {{synchronized}} block not only to lock access to the {{cachedOrdMaps}} but also to protect the critical section between getting, building and putting the {{OrdinalMap}} into the cache. Inside the critical section the decision is formed, whether a cacheable value should be composed and added to the cache. To support non-blocking read access to the cache, we move the building part of the critical section into a producer {{Function}} . The check whether we have a cacheable value is made upfront. To properly make that decision we had to take logic from {{MultiDocValues#getSortedSetValues}} and {{MultiDocValues#getSortedValues}} (the {{SlowCompositeReaderWrapper}} already contained duplicated code from those methods). h3. Summary This change removes most blocking access inside the {{SlowCompositeReaderWrapper}} and despite it's name it's now capable of a much higher request throughput. This change has been composed together by Dennis Berger, Torsten Bøgh Köster and Marco Petris. was: The {{SlowCompositeReaderWrapper}} uses synchronized read and write access to its internal {{cachedOrdMaps}} . By using a {{ConcurrentHashMap}} instead of a {{LinkedHashMap}} as the underlying {{cachedOrdMaps}} implementation and the {{ConcurrentHashMap#computeIfAbsent}} method to compute cache values, we were able to reduce locking contention significantly. h3. Background Under heavy load we discovered that application halts inside of Solr are becoming a serious problem in high traffic environments. Using Java Flight Recordings we discovered high accumulated applications halts on the {{cachedOrdMaps}} in {{SlowCompositeReaderWrapper}} . Without this fix we were able to utilize our machines only up to 25% cpu usage. With the fix applied, a utilization up to 80% is perfectly doable. h3. Description Our Solr instances utilizes the {{collapse}} component heavily. The instances run with 32 cores and 32gb Java heap on a rather small index (4gb). The instances scale out at 50% cpu load. We take Java Flight Recorder snapshots of 60 seconds as soon the cpu usage exceeds 50%. !slow-composite-reader-wrapper-before.jpg! During our 60s Java Flight Recorder snapshot, the ~2k Jetty threads accumulated more than 16h locking time inside the {{SlowCompositeReaderWrapper}} (see screenshot). With this fix applied, the locking access is reduced to cache write accesses only. We validated this using another JFR snapshot: !slow-composite-reader-wrapper-after.jpg! h3. Solution We propose the following improvement inside the {{SlowCompositeReaderWrapper}} removing blocking {{synchronized}} access to the internal {{cachedOrdMaps}} . The implementation keeps the semantics of the {{getSortedDocValues}} and {{getSortedSetDocValues}} methods but moves the expensive part of {{OrdinalMap#build}} into a producer. We use the producer to access the {{ConcurrentHashMap}} using the {{ConcurrentHashMap#computeIfAbsent}} method only. The current implementation uses the {{synchronized}} block not only to lock access to the {{cachedOrdMaps}} but also to protect the critical section between getting, building and putting the {{OrdinalMap}} into the cache. Inside the critical section the decision is formed, whether a cacheable value should be composed and added to the cache. To support non-blocking read access to the cache, we move the building part of the critical section into a producer {{Function}} . The check whether we have a cacheable value is made upfront. To properly make that decision we had to take logic from {{MultiDocValues#getSortedSetValues}} and {{MultiDocValues#getSortedValues}} (the {{SlowCompositeReaderWrapper}} already contained duplicated code from those methods). h3. Summary This change removes most blocking access inside the {{SlowCompositeReaderWrapper}} and despite it's name it's now capable of a much higher request throughput. This change has been composed together by Dennis Berger, Torsten Bøgh Köster and Marco Petris. > Remove synchronized access to cachedOrdMaps in SlowCompositeReaderWrapper > ------------------------------------------------------------------------- > > Key: SOLR-16515 > URL: https://issues.apache.org/jira/browse/SOLR-16515 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search > Affects Versions: 9.0, 8.11.2 > Reporter: Torsten Bøgh Köster > Priority: Major > Attachments: slow-composite-reader-wrapper-after.jpg, > slow-composite-reader-wrapper-before.jpg > > > The {{SlowCompositeReaderWrapper}} uses synchronized read and write access > to its internal {{cachedOrdMaps}} . By using a {{ConcurrentHashMap}} > instead of a {{LinkedHashMap}} as the underlying {{cachedOrdMaps}} > implementation and the {{ConcurrentHashMap#computeIfAbsent}} method to > compute cache values, we were able to reduce locking contention significantly. > h3. Background > Under heavy load we discovered that application halts inside of Solr are > becoming a serious problem in high traffic environments. Using Java Flight > Recordings we discovered high accumulated applications halts on the > {{cachedOrdMaps}} in {{SlowCompositeReaderWrapper}} . Without this fix we > were able to utilize our machines only up to 25% cpu usage. With the fix > applied, a utilization up to 80% is perfectly doable. > h3. Description > Our Solr instances utilizes the {{collapse}} component heavily. The > instances run with 32 cores and 32gb Java heap on a rather small index (4gb). > The instances scale out at 50% cpu load. We take Java Flight Recorder > snapshots of 60 seconds > as soon the cpu usage exceeds 50%. > !slow-composite-reader-wrapper-before.jpg|height=1024px! > During our 60s Java Flight Recorder snapshot, the ~2k Jetty threads > accumulated more than 16h locking time inside the > {{SlowCompositeReaderWrapper}} (see screenshot). With this fix applied, the > locking access is reduced to cache write accesses only. We validated this > using another JFR snapshot: > !slow-composite-reader-wrapper-after.jpg|height=1024px! > h3. Solution > We propose the following improvement inside the > {{SlowCompositeReaderWrapper}} removing blocking {{synchronized}} access > to the internal {{cachedOrdMaps}} . The implementation keeps the semantics > of the {{getSortedDocValues}} and {{getSortedSetDocValues}} methods but > moves the expensive part of {{OrdinalMap#build}} into a producer. We use > the producer to access the {{ConcurrentHashMap}} using the > {{ConcurrentHashMap#computeIfAbsent}} method only. > The current implementation uses the {{synchronized}} block not only to lock > access to the {{cachedOrdMaps}} but also to protect the critical section > between getting, building and putting the {{OrdinalMap}} into the cache. > Inside the critical section the decision is formed, whether a cacheable value > should be composed and added to the cache. > To support non-blocking read access to the cache, we move the building part > of the critical section into a producer {{Function}} . The check whether we > have a cacheable value is made upfront. To properly make that decision we had > to take logic from {{MultiDocValues#getSortedSetValues}} and > {{MultiDocValues#getSortedValues}} (the {{SlowCompositeReaderWrapper}} > already contained duplicated code from those methods). > h3. Summary > This change removes most blocking access inside the > {{SlowCompositeReaderWrapper}} and despite it's name it's now > capable of a much higher request throughput. > This change has been composed together by Dennis Berger, Torsten Bøgh Köster > and Marco Petris. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org