Query on Solr Caches, OOM Errors and - how q and fq affect Solr Cache memory consumption

2014-10-11 Thread IJ
This question is related to Out of Memory Errors that I am seeing on my Solr
Cloud Setup - I am running Solr 4.5.1.

Here is how my setup looks:
1. Have 6 Solr Tomcat Nodes distributed across 3 Servers - i.e. 2 nodes per
server
2. Each tomcat node has been allocated 2 GB RAM - XmX setting

Have two Collections:
1. UserGroupMappings (2 shards x 3 replicas = 6 Cores)
2. GroupCustomerMappings (2 shards x 3 replicas = 6 Cores)

Each Solr Tomcat Node contains 2 cores - one from each of the 2 collection.
Every Solr Tomcat Node manages a combined index size (from 2 cores) of
approximately 6 GB.

Here is the approximate structure of my Collections (have eliminated some
fields for ease of representation):
1. UserGroupCollection - UserId, CompanyName, GroupName
Each User is affiliated with exactly one company - but could service
multiple Groups

2. GroupCustomerMappings - CustomerId, FirstName, LastName, Address,
CustomerNameAndAddress (Copy Field), GroupName (multi-valued field), Company
Name
A Customer could be affiliated with multiple companies - and within a
company could be affiliated with one or more groups

From my FE - when a user performs a search - he / she is looking for
customers by entering their name or address.
My application performs two queries:

Query 1: Fetch all Groups for current user from UserGroupCollection
q= *:*
fq= UserId:current user id and CompanyName:current user's company
fl = GroupName

Query 2: Search for Customers within groups returned from Query 1
q= CustomerNameAndAddress: user entered search term
fq= CompanyName:current user's company and (GroupName: first groupName
returned by Query 1 OR GroupName: second groupName returned by Query 1 OR
 )
fl = CustomerId, FirstName, LastName, Address

Have been load testing the application with 100 unique users. Have been
noticing that every now and then some of my nodes run out of Memory.

Did some research on the net and came across this article -
http://teaspoon-consulting.com/articles/solr-cache-tuning.html - that
explains the structure of the query cache, filter cache and document cache.
In specific it states that:
1. Each entry in the Query Cache holds an array of integers containing
DocIds that are returned as part of the search results.
2. Each entry in the Filter Cache holds an array of bits, and the size of
the array equals the total number of documents in the current core (this was
a real eye opener - points at the fact that the memory requirements of this
cache increases for larger indices)

Based on the above information should I consider Query 1 to something as
follows:
Query 1 (Modified)
q= UserId:current user id and CompanyName:current user's company
fl = GroupName

Basically, I have gotten rid of the fq completely and put the query
parameters in the q. I am doing this because I know that Each User in my
system manages a max of 50 Groups - which means a max of 50 Doc Ids
(integers) in each query cache entry - as compared to 500,000 bits per
filter cache entry - which is a lot more memory.

I get the feeling that the original Query 1 is probably the semantically
right solution (also I dont care about scoring for Query 1 - since its just
a data fetch than a search) - but modified Query 1 will be much more
performant. Is this a correct decision that I am making on Query 1 ?

Also on Query 2 - I am looking to modify it as follows:
Query 2 (Modified)
q= CustomerNameAndAddress: user entered search term and
CompanyName:current user's company
fq= CompanyName:current user's company and (GroupName: first groupName
returned by Query 1 OR GroupName: second groupName returned by Query 1 OR
 )
fl = CustomerId, FirstName, LastName, Address

The only change I am making here is to add the CompanyName:current user's
company which was originally only in the fq to the q as well (please
note that I am NOT removing this parameter from the fq).
The reason I am doing this is - if a user on the system searches for a
person named john - I don't want the Query Cache to hold docIds for John's
across all companies. I just want the query cache to hold DocIds for all
John's in context of the current company.

Is this a correct decision on Query 2 ? The one thing I am NOT very sure
about is whether its appropriate / justifiable in my Use Case to have the
same query parameter CompanyName in both the q and fq.

Also, need to mention that I fell into the trap of setting extremely huge
cache sizes for the Solr Caches - Had set the sizes of each of the 3 caches
(query, filter and document caches) to 500,000 entries - which is way too
extreme. I am going to reduce that number to something between 500 - 1000
per cache based on query patterns that I see in the live system.

I did a memory dump on one of my solr tomcat nodes - when it was on the
verge of running out of memory (with my extreme cache settings) - and the
biggest consumers of memory happened to be:

class [Ljava.lang.Object;   - 9,692
instances, Size - 

Re: solr always loading and not any response

2014-07-27 Thread IJ
I always get the Loading message on the Solr Admin Console if I use IE.
However - the page loads perfectly fine when I use Google Chrome or Mozilla
Firefox.
Could you check if your problem resolves itself if you use a different
browser ???




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-always-loading-and-not-any-response-tp4148960p4149341.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: shards as subset of All Shards

2014-07-19 Thread IJ
Here is one potential design approach:

1. Create a single collection (instead of two collections).
Let your schema have a RecordType field which can take the values of
either initial or follow-up for documents that are indexed into this
collection.

2. Let there be 30 shards - just like you have it. However - implement a
document co-location strategy in your indexing - so that a single customers
records (both initial and follow-up) always get indexed into the same
single shard.

Read up this link on Document Routing to learn more on how to implement
this -
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud#ShardsandIndexingDatainSolrCloud-DocumentRouting

3. When your search App queries the Collection - use the _route_=customer
Id / Name parameter to force searches on the correct shard.

Such a design ensures that your queries doesn't get distributed across all
nodes / shards on your system - which could cause latency issues of its own.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/shards-as-subset-of-All-Shards-tp4147998p4148038.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem

2014-07-16 Thread IJ
I know u mentioned you have a single machine at play - but do you have
multiple nodes on the machine that talk to one another ??

Does your problem recur when the load on the system is low ?

Also faced a similar problem wherein the 5 second delay (described in
detail on my other post) kept happening after a 1.5 minute inactivity
interval. This was explained off as Solr keeping alive the http connection
for inter-node communication for around 1.5 minutes before disconnecting -
and if a new request happens post 1.5 minutes then, a new connection is
created - which probably suffers a latency due to a DNS Name Lookup delay.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-irregularly-having-QTime-5ms-stracing-solr-cures-the-problem-tp4146047p4147512.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: clearing fieldValueCache in solr 4.6

2014-07-16 Thread IJ
One thing you could do is:
1. If you current index is called A1, then you can create a new index called
A2 with the correct schema.xml / solrconfig.xml
2. Index your 18,000 documents into A2 afresh
3. Then delete A1 (the bad index)
4. Then quickly create an Alias with the name of A1 pointng to A2 - This way
your consumers will still think they are talking to A1 - but in fact they
would be querying against the new index.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/clearing-fieldValueCache-in-solr-4-6-tp4147509p4147514.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem

2014-07-12 Thread IJ
GUess - I had the same issues as you. Was resolved 
http://lucene.472066.n3.nabble.com/Slow-QTimes-5-seconds-for-Small-sized-Collections-td4143681.html

was resolved by adding an explicit host mapping entry on /etc/hosts for
inter node solr communication - thereby bypassing DNS Lookups.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-irregularly-having-QTime-5ms-stracing-solr-cures-the-problem-tp4146047p4146858.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does Solr move documents between shards when the value of the shard key is updated ?

2014-07-02 Thread IJ
So - we do end up with two copies / versions of the same document (uniqueid)
- one in each of the two shards - Is this a BUG or a FEATURE in Solr ?

Have a follow up question - In case one were to attempt to delete the
document -lets say usng the CloudSolrServer - deleteById() API - would that
attempt to delete the document in both (or all) shards ? How would Solr
determine which shard / shards to run the delete against ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-Solr-move-documents-between-shards-when-the-value-of-the-shard-key-is-updated-tp4145043p4145237.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Slow QTimes - 5 seconds for Small sized Collections

2014-07-02 Thread IJ
This issue was finally resolved. Adding an explicit Host - IP address mapping
on /etc/host file seemed to do the trick. The one strange thing is - before
the host file entry was made - we were unable to simulate the 5 second delay
from the linux shell by performing a simple nslookup host name. In any
case - the issue now stands resolved - Thanks to all.

On the other discussion item about the QTime in the SolrQueryResponse NOT
matching the QTime in the Solr.log, here is what I found:

1. If the Query from CloudSolrServer hit the right node (i.e. contains the
shard with the desired dataset), then the QTimes match

2. If the Query from CloudSolrServer hits a node (NodeX) that does NOT
contain our data - then Solr routes the request to the right node (NodeY) to
fetch the data. In such situations - QTime in logged in both nodes that the
query passes through - albeit with different values. The QTime logged on
NodeX matches what we see on SolrQueryResponse - and this time includes the
time for inter-node communication between NodeX and NodeY.

In essence this means that the QTime in SolrQueryResponse is NOT always a
representation of the query time - but could include time spent for
inter-node communication.

P.S. All of the above statements were made in context of a sharding strategy
to co-locate a single customer's document into a single shard.

Here is a short wishlist based on the experience in debugging this issue:
1. Wish SolrQueryResponse could contain a list of node names / shard-replica
names  that a request passed through for processing the query (when debug is
turned ON)
2. Wish SolrQueryResponse could provide a breakup of QTime on each of the
individual nodes / shard-replicas - instead of returning a single value of
QTime



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slow-QTimes-5-seconds-for-Small-sized-Collections-tp4143681p4145251.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Out of Memory when i downdload 5 Million records from sqlserver to solr

2014-07-01 Thread IJ
We faced similar problems on our side. We found it more reliable to have a
mechanism to extract all data from the Database into a flat file - and then
use a JAVA program to bulk index into Solr from the file via SolrJ API.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Out-of-Memory-when-i-downdload-5-Million-records-from-sqlserver-to-solr-tp4144949p4145041.html
Sent from the Solr - User mailing list archive at Nabble.com.


Does Solr move documents between shards when the value of the shard key is updated ?

2014-07-01 Thread IJ
Lets say I create a Solr Collection with multiple shards (say 2 shards) and
set the value of router.field to a field called CompanyName. Now - we
all know that during Indexing Solr would compute a hash on the value indexed
into the CompanyName and route to an appropriate shard.

Lets say I index a document into this Collection - and Solr routes the
document into Shard 1 (based on the computed Hash). Now, lets say - I
re-index the same document (same unique key) - but with a different value of
the CompanyName - and lets say the Solr now determines that the document
should route to Shard 2 - In such a situation - would solr delete the older
version of the document from Shard 1 ? OR would I end up with two versions
of the same Document (same unique key) in both shards ?

My system allows updates to fields that I choose as the shard key. I
definitely want the document to be moved from Shard 1 into Shard 2 when i
perform the re-indexing. Would this work as expected ? OR should I be doing
an explicit delete prior to re-indexing such documents ??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-Solr-move-documents-between-shards-when-the-value-of-the-shard-key-is-updated-tp4145043.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: While creating collection in SolrCloud can we manually select machines(nodes)

2014-06-29 Thread IJ
Yes, the Solr Collections API allows you to pass in a set of explicit nodes
(subset of the complete list of nodes in your cluster) to setup your
Collection.

This the createNodeSet input parameter in the CREATE COLLECTION API -
described as follows in the documentation: 
Allows defining the nodes to spread the new collection across. If not
provided, the CREATE operation will create shard-replica spread across all
live Solr nodes. The format is a comma-separated list of node_names, such as
localhost:8983_solr,localhost:8984_solr,localhost:8985_solr.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/While-creating-collection-in-SolrCloud-can-we-manually-select-machines-nodes-tp4144593p4144614.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to sort value that numeric mix alpha

2014-06-29 Thread IJ
Try indexing your data as follows:

C01,C02,C03,C04,C09,C12,C23,C50
 instead of 

C1,C2,C3,C4,C9,C12,C23,C50

and the sort order would work correctly.

BTW, what you are describing as an issue is NOT unique to Solr. The same
happens on regular Databases as well. Google up how database type systems
perform alphanumeric sorts - and you'll know why.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-sort-value-that-numeric-mix-alpha-tp4144615p4144616.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Slow QTimes - 5 seconds for Small sized Collections

2014-06-27 Thread IJ
I am a colleague of the person who posted the original question. We have done
some more analysis and have more information to provide.

Here are the responses to Toke's questions:

 * Do they (slow performing queries) occur under heavy network load? 

No, they don't. This happens even when there is only a single user on the
system. It doesn't appear to be a capacity issue.

 * Do they occur after specific queries? 
No, even the simplest of queries run slow - and when things are slow - the
Qtimes always hover around the something greater than 5000 ms.

 * Do they occur at specific times (e.g. each whole hour)? 
They don't occur at specific times - However there is indeed a timing aspect
behind this issue - which I shall explain below.

Here is what I did - I fired a single query multiple times again and again
on all nodes in my cluster - and observed the following:

1. Slowness happens only if the Client App sends the request to a node (lets
call this NodeX) that does NOT host the shard containing the data we are
looking for (we use a document co-location strategy to index related
documents into a single shard).

2. Slowness never ever happens when Client App sends the request to a node
(lets call this NodeY) that hosts the correct shard (i.e the data we are
looking for).

3. Slowness does NOT happen if the Client App sends the request to NodeX -
and the previous query to NodeX was executed within the last 1.5 minutes.

4. Slowness happens if the Client App sends the request to NodeX - and the
previous query to NodeX was executed prior to the last 1.5 minutes.

These observations leads me to believe the following (still a theory):
1. There is something thats breaking / disrupting inter-node communication
between NodeX and NodeY
Could this be a firewall or something similar ?

2. Whenever NodeX remains idle for more than 1.5 minutes - its connection to
NodeY is dropped (I can't see anything in the logs to that effect though),
and when the next request comes in - it takes 5 seconds to recreate the
connection
This 1.5 minute window and the 5 second delay are pretty consistent

I checked with my network folks - and they say that all network interfaces
are UP - and there are NO packet losses between the servers in question.

What else should I be asking my friends in the networking group to look at ?
They did ask me what protocol Solr uses for inter-node communication - and I
answered HTTP.

Thanks and appreciate your inputs.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slow-QTimes-5-seconds-for-Small-sized-Collections-tp4143681p4144493.html
Sent from the Solr - User mailing list archive at Nabble.com.