[ https://issues.apache.org/jira/browse/SOLR-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234145#comment-14234145 ]
Per Steffensen edited comment on SOLR-6816 at 12/4/14 12:11 PM: ---------------------------------------------------------------- Just want to add my 5 cents on this one. It is only regarding indexing when you do version-check/optimistic-locking (SOLR-3178). We have a very different implementation of SOLR-3178, but the performance problems will be the same for "your" implementation. Doing optimistic-locking you typically do a lot of this * 1) real-time-get document D from Solr * 2) update D to D' locally on client * 3) try to replace D with D' in Solr. In case of version-conflict-error go to 1) In step 1) you get-by-id document D, and I step 3) you UpdateLog.lookupVersion on the same id. In our system it is most likely that two processes, both wanting to update document D, run at the same time or fairly shortly after each other. It is rare that the same document gets updated a long time apart. In order to speed up on those aspects, we have introduced a "recently looked-up or updated" cache, where we store documents that has recently been fetch by real-time-get or updated. It has improved our indexing speed significantly. We have a mature solution that is running in production. In the scenarios above you most often discover that the document you try to real-time-get or lookup-version for does NOT exist, but it is relatively time-consuming to realize that (looking in index). We have a PoC of introducing a bloom-filter that can help say one of "document definitely does not exist" (you do not have to search the index) or "document may exist" (you will have to search the index to see if it exists). Our PoC shows that this will speed up our indexing-speed tremendously (like 60-80% reduction), but we havnt prioritized to mature and put it into production yet. The PoC was using a modified version of Guava bloom-filter - modified to be able work in a memory-mapped file, so that we do not lose bloom-filter information when shutting down Solr (it will take some time building it from scratch every time you start Solr). Guava bloom-filter currently is memory only - you can save it to file and load it again, but it will not go on continuously, and it is not efficient to store it completely to disk at every update :-) Hence the "work in memory-mapped file" modification. Of course, let me know if any of this sounds interesting to you. was (Author: steff1193): Just want to add my 5 cents on this one. It is only regarding indexing when you do version-check/optimistic-locking (SOLR-3178). We have a very different implementation of SOLR-3178, but the performance problems will be the same for "your" implementation. Doing optimistic-locking you typically do a lot of this * 1) real-time-get document D from Solr * 2) update D to D' locally on client * 3) try to replace D with D' in Solr. In case of version-conflict-error go to 1) In step 1) you get-by-id document D, and I step 3) you UpdateLog.lookupVersion on the same id. In our system it is most likely that two processes, both wanting to update document D, run at the same time or fairly shortly after each other. It is rare that the same document gets updated a long time apart. In order to speed up on those aspects, we have introduced a "recently looked-up or updated" cache, where we store documents that has recently been fetch by real-time-get or updated. It has improved our indexing speed significantly. We have a mature solution that is running in production. In the scenarios above you most often discover that the document you try to real-time-get or lookup-version for does NOT exist, but it is relatively time-consuming to realize that (looking in index). We have a PoC of introducing a bloom-filter that can help say one of "document definitely does not exist" (you do not have to search the index) or "document may exist" (you will have to search the index to see if it exists). Our PoC shows that this will speed up our indexing-speed tremendously (like 60-80% reduction), but we havnt prioritized to mature and put it into production yet. The PoC was using a modified version of Guava bloom-filter - modified to be able work in a memory-mapped file, so that we do not lose bloom-filter information when shutting down Solr (it will take some time building it from scratch every time you start Solr). Guava bloom-filter currently is memory only - you can save it to file and load it again, but it will not go on continuously, and it is not efficient to store it completely to disk at every update :-) Hence the "work in memory-mapped file" modification. > Review SolrCloud Indexing Performance. > -------------------------------------- > > Key: SOLR-6816 > URL: https://issues.apache.org/jira/browse/SOLR-6816 > Project: Solr > Issue Type: Task > Components: SolrCloud > Reporter: Mark Miller > Priority: Critical > Attachments: SolrBench.pdf > > > We have never really focused on indexing performance, just correctness and > low hanging fruit. We need to vet the performance and try to address any > holes. > Note: A common report is that adding any replication is very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org