Query on Solr Caches, OOM Errors and - how q and fq affect Solr Cache memory consumption
This question is related to Out of Memory Errors that I am seeing on my Solr Cloud Setup - I am running Solr 4.5.1. Here is how my setup looks: 1. Have 6 Solr Tomcat Nodes distributed across 3 Servers - i.e. 2 nodes per server 2. Each tomcat node has been allocated 2 GB RAM - XmX setting Have two Collections: 1. UserGroupMappings (2 shards x 3 replicas = 6 Cores) 2. GroupCustomerMappings (2 shards x 3 replicas = 6 Cores) Each Solr Tomcat Node contains 2 cores - one from each of the 2 collection. Every Solr Tomcat Node manages a combined index size (from 2 cores) of approximately 6 GB. Here is the approximate structure of my Collections (have eliminated some fields for ease of representation): 1. UserGroupCollection - UserId, CompanyName, GroupName Each User is affiliated with exactly one company - but could service multiple Groups 2. GroupCustomerMappings - CustomerId, FirstName, LastName, Address, CustomerNameAndAddress (Copy Field), GroupName (multi-valued field), Company Name A Customer could be affiliated with multiple companies - and within a company could be affiliated with one or more groups From my FE - when a user performs a search - he / she is looking for customers by entering their name or address. My application performs two queries: Query 1: Fetch all Groups for current user from UserGroupCollection q= *:* fq= UserId:current user id and CompanyName:current user's company fl = GroupName Query 2: Search for Customers within groups returned from Query 1 q= CustomerNameAndAddress: user entered search term fq= CompanyName:current user's company and (GroupName: first groupName returned by Query 1 OR GroupName: second groupName returned by Query 1 OR ) fl = CustomerId, FirstName, LastName, Address Have been load testing the application with 100 unique users. Have been noticing that every now and then some of my nodes run out of Memory. Did some research on the net and came across this article - http://teaspoon-consulting.com/articles/solr-cache-tuning.html - that explains the structure of the query cache, filter cache and document cache. In specific it states that: 1. Each entry in the Query Cache holds an array of integers containing DocIds that are returned as part of the search results. 2. Each entry in the Filter Cache holds an array of bits, and the size of the array equals the total number of documents in the current core (this was a real eye opener - points at the fact that the memory requirements of this cache increases for larger indices) Based on the above information should I consider Query 1 to something as follows: Query 1 (Modified) q= UserId:current user id and CompanyName:current user's company fl = GroupName Basically, I have gotten rid of the fq completely and put the query parameters in the q. I am doing this because I know that Each User in my system manages a max of 50 Groups - which means a max of 50 Doc Ids (integers) in each query cache entry - as compared to 500,000 bits per filter cache entry - which is a lot more memory. I get the feeling that the original Query 1 is probably the semantically right solution (also I dont care about scoring for Query 1 - since its just a data fetch than a search) - but modified Query 1 will be much more performant. Is this a correct decision that I am making on Query 1 ? Also on Query 2 - I am looking to modify it as follows: Query 2 (Modified) q= CustomerNameAndAddress: user entered search term and CompanyName:current user's company fq= CompanyName:current user's company and (GroupName: first groupName returned by Query 1 OR GroupName: second groupName returned by Query 1 OR ) fl = CustomerId, FirstName, LastName, Address The only change I am making here is to add the CompanyName:current user's company which was originally only in the fq to the q as well (please note that I am NOT removing this parameter from the fq). The reason I am doing this is - if a user on the system searches for a person named john - I don't want the Query Cache to hold docIds for John's across all companies. I just want the query cache to hold DocIds for all John's in context of the current company. Is this a correct decision on Query 2 ? The one thing I am NOT very sure about is whether its appropriate / justifiable in my Use Case to have the same query parameter CompanyName in both the q and fq. Also, need to mention that I fell into the trap of setting extremely huge cache sizes for the Solr Caches - Had set the sizes of each of the 3 caches (query, filter and document caches) to 500,000 entries - which is way too extreme. I am going to reduce that number to something between 500 - 1000 per cache based on query patterns that I see in the live system. I did a memory dump on one of my solr tomcat nodes - when it was on the verge of running out of memory (with my extreme cache settings) - and the biggest consumers of memory happened to be: class [Ljava.lang.Object; - 9,692 instances, Size -
Re: solr always loading and not any response
I always get the Loading message on the Solr Admin Console if I use IE. However - the page loads perfectly fine when I use Google Chrome or Mozilla Firefox. Could you check if your problem resolves itself if you use a different browser ??? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-always-loading-and-not-any-response-tp4148960p4149341.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: shards as subset of All Shards
Here is one potential design approach: 1. Create a single collection (instead of two collections). Let your schema have a RecordType field which can take the values of either initial or follow-up for documents that are indexed into this collection. 2. Let there be 30 shards - just like you have it. However - implement a document co-location strategy in your indexing - so that a single customers records (both initial and follow-up) always get indexed into the same single shard. Read up this link on Document Routing to learn more on how to implement this - https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud#ShardsandIndexingDatainSolrCloud-DocumentRouting 3. When your search App queries the Collection - use the _route_=customer Id / Name parameter to force searches on the correct shard. Such a design ensures that your queries doesn't get distributed across all nodes / shards on your system - which could cause latency issues of its own. -- View this message in context: http://lucene.472066.n3.nabble.com/shards-as-subset-of-All-Shards-tp4147998p4148038.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem
I know u mentioned you have a single machine at play - but do you have multiple nodes on the machine that talk to one another ?? Does your problem recur when the load on the system is low ? Also faced a similar problem wherein the 5 second delay (described in detail on my other post) kept happening after a 1.5 minute inactivity interval. This was explained off as Solr keeping alive the http connection for inter-node communication for around 1.5 minutes before disconnecting - and if a new request happens post 1.5 minutes then, a new connection is created - which probably suffers a latency due to a DNS Name Lookup delay. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-irregularly-having-QTime-5ms-stracing-solr-cures-the-problem-tp4146047p4147512.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: clearing fieldValueCache in solr 4.6
One thing you could do is: 1. If you current index is called A1, then you can create a new index called A2 with the correct schema.xml / solrconfig.xml 2. Index your 18,000 documents into A2 afresh 3. Then delete A1 (the bad index) 4. Then quickly create an Alias with the name of A1 pointng to A2 - This way your consumers will still think they are talking to A1 - but in fact they would be querying against the new index. -- View this message in context: http://lucene.472066.n3.nabble.com/clearing-fieldValueCache-in-solr-4-6-tp4147509p4147514.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem
GUess - I had the same issues as you. Was resolved http://lucene.472066.n3.nabble.com/Slow-QTimes-5-seconds-for-Small-sized-Collections-td4143681.html was resolved by adding an explicit host mapping entry on /etc/hosts for inter node solr communication - thereby bypassing DNS Lookups. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-irregularly-having-QTime-5ms-stracing-solr-cures-the-problem-tp4146047p4146858.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does Solr move documents between shards when the value of the shard key is updated ?
So - we do end up with two copies / versions of the same document (uniqueid) - one in each of the two shards - Is this a BUG or a FEATURE in Solr ? Have a follow up question - In case one were to attempt to delete the document -lets say usng the CloudSolrServer - deleteById() API - would that attempt to delete the document in both (or all) shards ? How would Solr determine which shard / shards to run the delete against ? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-move-documents-between-shards-when-the-value-of-the-shard-key-is-updated-tp4145043p4145237.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slow QTimes - 5 seconds for Small sized Collections
This issue was finally resolved. Adding an explicit Host - IP address mapping on /etc/host file seemed to do the trick. The one strange thing is - before the host file entry was made - we were unable to simulate the 5 second delay from the linux shell by performing a simple nslookup host name. In any case - the issue now stands resolved - Thanks to all. On the other discussion item about the QTime in the SolrQueryResponse NOT matching the QTime in the Solr.log, here is what I found: 1. If the Query from CloudSolrServer hit the right node (i.e. contains the shard with the desired dataset), then the QTimes match 2. If the Query from CloudSolrServer hits a node (NodeX) that does NOT contain our data - then Solr routes the request to the right node (NodeY) to fetch the data. In such situations - QTime in logged in both nodes that the query passes through - albeit with different values. The QTime logged on NodeX matches what we see on SolrQueryResponse - and this time includes the time for inter-node communication between NodeX and NodeY. In essence this means that the QTime in SolrQueryResponse is NOT always a representation of the query time - but could include time spent for inter-node communication. P.S. All of the above statements were made in context of a sharding strategy to co-locate a single customer's document into a single shard. Here is a short wishlist based on the experience in debugging this issue: 1. Wish SolrQueryResponse could contain a list of node names / shard-replica names that a request passed through for processing the query (when debug is turned ON) 2. Wish SolrQueryResponse could provide a breakup of QTime on each of the individual nodes / shard-replicas - instead of returning a single value of QTime -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-QTimes-5-seconds-for-Small-sized-Collections-tp4143681p4145251.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Out of Memory when i downdload 5 Million records from sqlserver to solr
We faced similar problems on our side. We found it more reliable to have a mechanism to extract all data from the Database into a flat file - and then use a JAVA program to bulk index into Solr from the file via SolrJ API. -- View this message in context: http://lucene.472066.n3.nabble.com/Out-of-Memory-when-i-downdload-5-Million-records-from-sqlserver-to-solr-tp4144949p4145041.html Sent from the Solr - User mailing list archive at Nabble.com.
Does Solr move documents between shards when the value of the shard key is updated ?
Lets say I create a Solr Collection with multiple shards (say 2 shards) and set the value of router.field to a field called CompanyName. Now - we all know that during Indexing Solr would compute a hash on the value indexed into the CompanyName and route to an appropriate shard. Lets say I index a document into this Collection - and Solr routes the document into Shard 1 (based on the computed Hash). Now, lets say - I re-index the same document (same unique key) - but with a different value of the CompanyName - and lets say the Solr now determines that the document should route to Shard 2 - In such a situation - would solr delete the older version of the document from Shard 1 ? OR would I end up with two versions of the same Document (same unique key) in both shards ? My system allows updates to fields that I choose as the shard key. I definitely want the document to be moved from Shard 1 into Shard 2 when i perform the re-indexing. Would this work as expected ? OR should I be doing an explicit delete prior to re-indexing such documents ?? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-move-documents-between-shards-when-the-value-of-the-shard-key-is-updated-tp4145043.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: While creating collection in SolrCloud can we manually select machines(nodes)
Yes, the Solr Collections API allows you to pass in a set of explicit nodes (subset of the complete list of nodes in your cluster) to setup your Collection. This the createNodeSet input parameter in the CREATE COLLECTION API - described as follows in the documentation: Allows defining the nodes to spread the new collection across. If not provided, the CREATE operation will create shard-replica spread across all live Solr nodes. The format is a comma-separated list of node_names, such as localhost:8983_solr,localhost:8984_solr,localhost:8985_solr. -- View this message in context: http://lucene.472066.n3.nabble.com/While-creating-collection-in-SolrCloud-can-we-manually-select-machines-nodes-tp4144593p4144614.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to sort value that numeric mix alpha
Try indexing your data as follows: C01,C02,C03,C04,C09,C12,C23,C50 instead of C1,C2,C3,C4,C9,C12,C23,C50 and the sort order would work correctly. BTW, what you are describing as an issue is NOT unique to Solr. The same happens on regular Databases as well. Google up how database type systems perform alphanumeric sorts - and you'll know why. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-sort-value-that-numeric-mix-alpha-tp4144615p4144616.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slow QTimes - 5 seconds for Small sized Collections
I am a colleague of the person who posted the original question. We have done some more analysis and have more information to provide. Here are the responses to Toke's questions: * Do they (slow performing queries) occur under heavy network load? No, they don't. This happens even when there is only a single user on the system. It doesn't appear to be a capacity issue. * Do they occur after specific queries? No, even the simplest of queries run slow - and when things are slow - the Qtimes always hover around the something greater than 5000 ms. * Do they occur at specific times (e.g. each whole hour)? They don't occur at specific times - However there is indeed a timing aspect behind this issue - which I shall explain below. Here is what I did - I fired a single query multiple times again and again on all nodes in my cluster - and observed the following: 1. Slowness happens only if the Client App sends the request to a node (lets call this NodeX) that does NOT host the shard containing the data we are looking for (we use a document co-location strategy to index related documents into a single shard). 2. Slowness never ever happens when Client App sends the request to a node (lets call this NodeY) that hosts the correct shard (i.e the data we are looking for). 3. Slowness does NOT happen if the Client App sends the request to NodeX - and the previous query to NodeX was executed within the last 1.5 minutes. 4. Slowness happens if the Client App sends the request to NodeX - and the previous query to NodeX was executed prior to the last 1.5 minutes. These observations leads me to believe the following (still a theory): 1. There is something thats breaking / disrupting inter-node communication between NodeX and NodeY Could this be a firewall or something similar ? 2. Whenever NodeX remains idle for more than 1.5 minutes - its connection to NodeY is dropped (I can't see anything in the logs to that effect though), and when the next request comes in - it takes 5 seconds to recreate the connection This 1.5 minute window and the 5 second delay are pretty consistent I checked with my network folks - and they say that all network interfaces are UP - and there are NO packet losses between the servers in question. What else should I be asking my friends in the networking group to look at ? They did ask me what protocol Solr uses for inter-node communication - and I answered HTTP. Thanks and appreciate your inputs. -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-QTimes-5-seconds-for-Small-sized-Collections-tp4143681p4144493.html Sent from the Solr - User mailing list archive at Nabble.com.