Re: How to change stateFomat to 2

2017-04-18 Thread Manohar Sripada
Thanks Erick!
state.json exists for each collection in the "tree" view of admin UI. So,
that format is set to 2. I will call the CLUSTERPROP collections API too
and set legacyCloud=false whenever I create a collection.

Thanks

On Tue, Apr 18, 2017 at 8:50 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> clusterstate.json will exist, it just should be empty if you're using
> state format 2.
>
> Note: if you have "state.json" files under each collections in ZK (see
> the "tree" view in the admin UI), then you _are_ in the format 2
> world. However, for Solr 5.x, there'a an obscure property
> "legacyCloud" that, if true will allow orphan replicas to reconstruct
> themselves in clusterstate.json even if the format is 2. The condition
> is that you have orphan replicas out there (where you've deleted the
> collection but for some reason were unable to delete the replica, say
> the Solr node hosting some replicas was down and you restarted it).
> When Solr starts up, this orphan reconstructs itself in
> clusterstate.json, where it's ignored.
>
> So you should set legacyCloud=false using the CLUSTERPROP (IIRC)
> collections API call. You can also just delete the _data_ from
> clusterstate.json. ASSUMING you're in format 2.
>
> If you're really in format 1, then see MIGRATESTATEFORMAT here:
> https://cwiki.apache.org/confluence/display/solr/Collections+API#
> CollectionsAPI-MIGRATESTATEFORMAT:MigrateClusterState
>
> Best,
> Erick
>
> On Tue, Apr 18, 2017 at 8:03 AM, Manohar Sripada <manohar...@gmail.com>
> wrote:
> > After deleting a collection through Collection API, the data is not
> getting
> > deleted from clusterstate.json. Based on this discussion
> > <http://lucene.472066.n3.nabble.com/create-collection-
> gets-stuck-on-node-restart-td4311994.html>,
> > it seems clusterstate.json shouldn't be there for Solr 5.x (I am using
> > 5.2.1). It also mentions that stateFormat should be set to 2.
> >
> > How to set stateFormat to 2 while calling the Collection API? Can I
> default
> > it to 2 during the setup itself so that I dont need to set it up for each
> > and every collection creation?
> >
> > Thanks in Advance!
>


How to change stateFomat to 2

2017-04-18 Thread Manohar Sripada
After deleting a collection through Collection API, the data is not getting
deleted from clusterstate.json. Based on this discussion
,
it seems clusterstate.json shouldn't be there for Solr 5.x (I am using
5.2.1). It also mentions that stateFormat should be set to 2.

How to set stateFormat to 2 while calling the Collection API? Can I default
it to 2 during the setup itself so that I dont need to set it up for each
and every collection creation?

Thanks in Advance!


Re: Solr node not found in ZK live_nodes

2016-12-06 Thread Manohar Sripada
Thanks Erick! Should I create a JIRA issue for the same?

Regarding the logs, I have changed the log level to WARN. That may be the
reason, I couldn't get anything from it.

Thanks,
Manohar

On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Most likely reason is that the Solr node in question,
> was not reachable thus it was removed from
> live_nodes. Perhaps due to temporary network
> glitch, long GC pause or the like. If you're rolling
> your logs over it's quite possible that any illuminating
> messages were lost. The default 4M size for each
> log is quite lo at INFO level...
>
> It does seem possible for a Solr node to periodically
> check its status and re-insert itself into live_nodes,
> go through recovery and all that. So far most of that
> registration logic is baked into startup code. What
> do others think? Worth a JIRA?
>
> Erick
>
> On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada <manohar...@gmail.com>
> wrote:
> > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6).
> >
> > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when
> setup
> > was done 3 months back. Suddenly, few days back our search started
> failing
> > because one of the solr node(consider s16) was not seen in Zookeeper,
> i.e.,
> > when we checked for *"ls /live_nodes"*, *s16 *solr node was not found.
> > However, the corresponding Solr process was up and running.
> >
> > To my surprise, I couldn't find any errors or warnings in solr or
> zookeeper
> > logs related to this. I have few questions -
> >
> > 1. Is there any reason why this registration to ZK was lost? I know logs
> > should provide some information, but, it didn't. Did anyone encountered
> > similar issue, if so, what can be the root cause?
> > 2. Shouldn't Solr be clever enough to detect that the registration to ZK
> > was lost (for some reason) and should try to re-register again?
> >
> > PS: The issue is resolved by restarting the Solr node. However, I am
> > curious to know why it happened in the first place.
> >
> > Thanks
>


Solr node not found in ZK live_nodes

2016-12-06 Thread Manohar Sripada
We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6).

All the Solr nodes were registered to Zookeeper (ls /live_nodes) when setup
was done 3 months back. Suddenly, few days back our search started failing
because one of the solr node(consider s16) was not seen in Zookeeper, i.e.,
when we checked for *"ls /live_nodes"*, *s16 *solr node was not found.
However, the corresponding Solr process was up and running.

To my surprise, I couldn't find any errors or warnings in solr or zookeeper
logs related to this. I have few questions -

1. Is there any reason why this registration to ZK was lost? I know logs
should provide some information, but, it didn't. Did anyone encountered
similar issue, if so, what can be the root cause?
2. Shouldn't Solr be clever enough to detect that the registration to ZK
was lost (for some reason) and should try to re-register again?

PS: The issue is resolved by restarting the Solr node. However, I am
curious to know why it happened in the first place.

Thanks


Re: Solr Suggester (AnalyzingInfix n BlendedInfix)

2016-09-28 Thread Manohar Sripada
Sure Erick! I will try applying the patch.

Thanks

On Wednesday, September 28, 2016, Erick Erickson <erickerick...@gmail.com>
wrote:

> AnalyzingInfixSuggester is a mini Solr index, it's working
> as designed by returning the choices you see. I don't think
> you can persuade it to do what you want OOB.
>
> I took a quick look at SOLR-7865 and it's a very simple fix, just
> 3 lines of code change and the rest is test code. Could you
> consider applying that patch to the 5.2.1 code base? and using that
> rather than fully upgrading?
>
> Best,
> Erick
>
> On Wed, Sep 28, 2016 at 4:21 AM, Manohar Sripada <manohar...@gmail.com
> <javascript:;>> wrote:
> > I am implementing auto suggestion on Business Name. I
> > used BlendedInfixLookupFactory which worked in all my uses until I
> > encountered into this bug (https://issues.apache.org/
> jira/browse/SOLR-7865),
> > where suggest.count doesn't work in Solr 5.2.1. But, I can't upgrade
> > anytime soon. :(
> >
> > I tried using AnalyzingInfixSuggester, but, I encountered couple of
> issues
> > with this. Can someone help me with these?
> >
> >1. This lookupImpl is returning duplicate business names (
> >https://issues.apache.org/jira/browse/LUCENE-6336) in results (the
> data
> >has duplicate business names) which isn't happening
> >with BlendedInfixLookupFactory. I don't want duplicate values.
> >2. Second one is, AnalyzingInfixSuggester is searching on all input
> >keywords - For example, if am looking for "Apple Corporation", it is
> >returning, "Apple Inc", "Apple Corporation", "Oracle Corporation",
> >"Microsoft Corporation". I need the data with only "Apple
> Corporation".
> >Again, this is working fine in BlendedInfixLookupFactory.
> >
> > I don't want fuzzy searches, so, I am not using it.
> >
> > Below are the respective configurations. The fields type uses Standard
> > Tokenizer.
> >
> >   
> > businessName_BIF
> > BlendedInfixLookupFactory
> > DocumentDictionaryFactory
> > business_name
> > true
> > true
> > true
> > 
> > text_standard
> > true
> > true
> > true
> > suggest_test_business_name_bif
> > 0
> > false
> > linear
> > 10
> > 
> > revenues
> > id
> > 
> >
> >   
> > businessName_AIF
> > AnalyzingInfixLookupFactory
> > DocumentDictionaryFactory
> > business_name
> > true
> > true
> > true
> > 
> > text_standard
> > true
> > true
> > true
> > suggest_test_business_name_aif
> > 0
> > true
> > false
> > 
> > revenues
> > id
> > 
>


Solr Suggester (AnalyzingInfix n BlendedInfix)

2016-09-28 Thread Manohar Sripada
I am implementing auto suggestion on Business Name. I
used BlendedInfixLookupFactory which worked in all my uses until I
encountered into this bug (https://issues.apache.org/jira/browse/SOLR-7865),
where suggest.count doesn't work in Solr 5.2.1. But, I can't upgrade
anytime soon. :(

I tried using AnalyzingInfixSuggester, but, I encountered couple of issues
with this. Can someone help me with these?

   1. This lookupImpl is returning duplicate business names (
   https://issues.apache.org/jira/browse/LUCENE-6336) in results (the data
   has duplicate business names) which isn't happening
   with BlendedInfixLookupFactory. I don't want duplicate values.
   2. Second one is, AnalyzingInfixSuggester is searching on all input
   keywords - For example, if am looking for "Apple Corporation", it is
   returning, "Apple Inc", "Apple Corporation", "Oracle Corporation",
   "Microsoft Corporation". I need the data with only "Apple Corporation".
   Again, this is working fine in BlendedInfixLookupFactory.

I don't want fuzzy searches, so, I am not using it.

Below are the respective configurations. The fields type uses Standard
Tokenizer.

  
businessName_BIF
BlendedInfixLookupFactory
DocumentDictionaryFactory
business_name
true
true
true

text_standard
true
true
true
suggest_test_business_name_bif
0
false
linear
10

revenues
id


  
businessName_AIF
AnalyzingInfixLookupFactory
DocumentDictionaryFactory
business_name
true
true
true

text_standard
true
true
true
suggest_test_business_name_aif
0
true
false

revenues
id



Zookeeper overseer queue clogging

2016-07-12 Thread Manohar Sripada
There are 16 Solr Nodes (Solr 5.2.1) & 5 Zookeeper Nodes (Zookeeper 3.4.6)
in our production cluster. We had to restart Solr nodes for some reason and
we are doing it after 3 months. To our surprise, none of the solr nodes
came up. We can see the Solr process running the machine, but, the Solr
Admin console is not reachable. We even tried restarting Zookeeper cluster
and Solr node cluster. Still, the issue remained.

On debugging I have found out -
1. Below exception in solr.log :


>
>
> *ERROR - 2016-07-12 07:43:48.988;
> org.apache.solr.servlet.SolrDispatchFilter; Could not start Solr. Check
> solr/home property and the logsERROR - 2016-07-12 07:43:49.012;
> org.apache.solr.common.SolrException;
> null:org.apache.solr.common.SolrException: Could not find collection :
> cont_coll_2_frat
> org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:164)*


2.  Connected to zookeeper quorum using Zookeeper's zkCli.sh and found out
that there are few collections (which are deleted using Solr Collections
Delete API) still exists in zookeeper (ls /collections). The same
collections doesn't exist on the solr node disk.

3. There are entries related to these deleted collections in Zookeeper's
clusterstate.json file as well.

4. There are many entries in overseer queue (/overseer/queue) & queue-work
(/overseer/queue-work).

I have tried below things based on some existing suggestions on the net  -
1. Stopped all the Solr nodes and removed unwanted (which are deleted using
Solr Collections Delete API) collections using *rmr *command from Zookeeper
(/collections).

2. Removed all the entries from overseer queue (/overseer/queue) &
queue-work (/overseer/queue-work) as well.

3. Restarted Zookeeper and then Solr.

Even, after doing this the issue still remains. Can someone help me on how
to resolve this?

- Thanks


Re: Multi-selected Faceting

2016-05-16 Thread Manohar Sripada
Thanks Eric! That worked.

On Mon, May 16, 2016 at 3:11 PM, Erik Hatcher <erik.hatc...@gmail.com>
wrote:

> Quick reply: Use a different tag/ex value for each field.
>
> > On May 16, 2016, at 04:42, Manohar Sripada <manohar...@gmail.com> wrote:
> >
> > We have a similar requirement as that of mentioned in solr wiki -
> >
> http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams
> >
> > Here, the example given works when single facet is selected, where I can
> > get the counts of other facet values.
> >
> >
> > *q=*:*=0=10=true=5=1&*
> >> *facet.field=country&*
> >> *facet.field={!ex=dt}industries={!tag=dt}industries:("services")*
> >
> >
> >> * Facet Response: *
> >
> > =*industries*=
> >
> > [x] Services (1010)
> >
> > [  ] Manufacturing (1002)
> >
> > [  ] Finance (221)
> >
> > [  ] Wholesale Trade (101)
> >
> > [  ] Transportation (50)
> >
> >
> >
> > =*country*=
> >
> > [  ] USA (450)
> >
> > [  ] Mexico (135)
> >
> > [  ] Germany (122)
> >
> > [  ] India (101)
> >
> > [  ] Australia (54)
> >
> >
> >
> > In the above query, I was trying to retrieve the facet counts for
> > industries facet & country facet, but with "services" selected. This
> > scenario works well. Now, when I select "USA", the results are narrowed
> > down to 450. However, the country facet counts aren't considering the
> > industries filter selected & industries facet count isn't considering
> > country filter selected. Below is the solr query and corresponding facet
> > counts.
> >
> >
> >
> *q=*:*=0=10=true=5=1={!ex=dt}country={!ex=dt}industries={!tag=dt}industries:("services")={tag=dt}country:("USA")*
> >
> >
> >> * Facet Response: *
> >
> > =*industries*=
> >
> > [x] Services (1010)
> >
> > [  ] Manufacturing (1002)
> >
> > [  ] Finance (221)
> >
> > [  ] Wholesale Trade (101)
> >
> > [  ] Transportation (50)
> >
> >
> >
> > =*country*=
> >
> > [x] USA (1312)
> >
> > [  ] Mexico (1290)
> >
> > [  ] Canada (1192)
> >
> > [  ] China (900)
> >
> > [  ] Japan (450)
> >
> >
> >
> > I was expecting *"country" *facet count to be as it was before (i.e. to
> > consider industries filter) and *"industries" *to consider the country
> > filter while calculating facet counts. Can someone help me here?
>


Multi-selected Faceting

2016-05-16 Thread Manohar Sripada
We have a similar requirement as that of mentioned in solr wiki -
http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams

Here, the example given works when single facet is selected, where I can
get the counts of other facet values.


*q=*:*=0=10=true=5=1&*
> *facet.field=country&*
> *facet.field={!ex=dt}industries={!tag=dt}industries:("services")*
>


> * Facet Response: *

 =*industries*=

 [x] Services (1010)

 [  ] Manufacturing (1002)

 [  ] Finance (221)

 [  ] Wholesale Trade (101)

 [  ] Transportation (50)



=*country*=

 [  ] USA (450)

 [  ] Mexico (135)

 [  ] Germany (122)

 [  ] India (101)

 [  ] Australia (54)



In the above query, I was trying to retrieve the facet counts for
industries facet & country facet, but with "services" selected. This
scenario works well. Now, when I select "USA", the results are narrowed
down to 450. However, the country facet counts aren't considering the
industries filter selected & industries facet count isn't considering
country filter selected. Below is the solr query and corresponding facet
counts.


*q=*:*=0=10=true=5=1={!ex=dt}country={!ex=dt}industries={!tag=dt}industries:("services")={tag=dt}country:("USA")*
>


> * Facet Response: *

 =*industries*=

 [x] Services (1010)

 [  ] Manufacturing (1002)

 [  ] Finance (221)

 [  ] Wholesale Trade (101)

 [  ] Transportation (50)



=*country*=

 [x] USA (1312)

 [  ] Mexico (1290)

 [  ] Canada (1192)

 [  ] China (900)

 [  ] Japan (450)



I was expecting *"country" *facet count to be as it was before (i.e. to
consider industries filter) and *"industries" *to consider the country
filter while calculating facet counts. Can someone help me here?


Re: Index BackUp using JDK 8 & Restore using JDK 7. Does this work?

2016-04-18 Thread Manohar Sripada
Thanks Shawn! :-)

On Mon, Apr 18, 2016 at 6:42 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 4/18/2016 12:49 AM, Manohar Sripada wrote:
> > We are using Solr 5.2.1 and JDK 7. We do create a static index in one
> > cluster (solr cluster 1) and ship that index to another cluster (Solr
> > cluster 2).  Solr Cluster 2 is the one where queries will be fired.
> >
> > Due to some unavoidable reasons, we want to upgrade Solr Cluster 1 to JDK
> > 8. But, we can't upgrade Solr cluster 2 to JDK 8 in near future. Does the
> > backed up index from Solr cluster 1 which uses JDK 8 works when restored
> in
> > Solr cluster 2 which uses JDK 7?
>
> The Lucene index format (Solr is a Lucene application) is the same
> regardless of Java version or hardware platform.  Some programs (rrdtool
> being the prominent example) have a different file format on 32-bit CPUs
> compared to 64-bit CPUs ... but Lucene/Solr is not one of those programs.
>
> Some info you might already know: Solr 4.8.x through 5.5.x require Java
> 7.  Solr 6.0.0 requires Java 8.
>
> Thanks,
> Shawn
>
>


Index BackUp using JDK 8 & Restore using JDK 7. Does this work?

2016-04-18 Thread Manohar Sripada
We are using Solr 5.2.1 and JDK 7. We do create a static index in one
cluster (solr cluster 1) and ship that index to another cluster (Solr
cluster 2).  Solr Cluster 2 is the one where queries will be fired.

Due to some unavoidable reasons, we want to upgrade Solr Cluster 1 to JDK
8. But, we can't upgrade Solr cluster 2 to JDK 8 in near future. Does the
backed up index from Solr cluster 1 which uses JDK 8 works when restored in
Solr cluster 2 which uses JDK 7?


Re: Issue with Auto Suggester Component

2016-03-16 Thread Manohar Sripada
Thanks for the response.
If you see the first 5 results- "*ABC* Corporation", "*ABC*D
Corporation", "*Abc
*Tech", "*AbC*orporation", "*ABC*D company". The keyword "*abc*" that I am
trying to search is part of prefix of all the strings. Sorry, it's not
entire keyword to be of higher importance like #1, #3 and #6.
In the 2nd set of results, "The *ABC* Company", "The *ABC*DEF", the keyword
"*abc*" is not part of prefix of 1st string, but it is part of some other
string of each result.

Thanks,
Manohar

On Tue, Mar 15, 2016 at 3:03 PM, Alessandro Benedetti <abenede...@apache.org
> wrote:

> Hi Manohar,
> I have not clear what should be your ideal ranking of suggestions.
>
> "I want prefix search of
> entire keyword to be of high preference (#1 to #5 in the below example)
> followed by prefix part of any other string (the last 2 in the below
> example). I am not bothered about ordering within 1st and 2nd set.
>
> ABC Corporation
> ABCD Corporation
> Abc Tech
> AbCorporation
> ABCD company
> The ABC Company
> The ABCDEF"
>
> Could you take the example you posted, show an example of query and the
> expected sort order ?
> According to your description of the problem
> Query : abc
> 1 Criteria : entire keyword to be of high preference
> I can't understand why you didn't count #3, #6 but you did #5 .
>
> 2 Criteria : followed by prefix part of any other string
> It is not that clear, probably you mean all the rest.
> Anyway an infix lookup algorithm with a boost for exact search should do
> the trick.
>
> Please give us some more details !
>
> Cheers
>
> On Tue, Mar 15, 2016 at 8:19 AM, Manohar Sripada <manohar...@gmail.com>
> wrote:
>
> > Consider the below company names indexed. I want the below auto
> suggestions
> > to be listed when searched for "abc". Basically, I want prefix search of
> > entire keyword to be of high preference (#1 to #5 in the below example)
> > followed by prefix part of any other string (the last 2 in the below
> > example). I am not bothered about ordering within 1st and 2nd set.
> >
> > ABC Corporation
> > ABCD Corporation
> > Abc Tech
> > AbCorporation
> > ABCD company
> > The ABC Company
> > The ABCDEF
> >
> > I am using Suggest feature of solr as mentioned in the wiki
> > <https://cwiki.apache.org/confluence/display/solr/Suggester>. I used
> > different Lookup implementations available, but, I couldn't get the
> result
> > as above. Here's is one sample config I used with
> BlendedInfixLookupFactory
> >
> >
> >  **
> > * businessNameBlendedInfixSuggester1*
> > * BlendedInfixLookupFactory*
> > * DocumentDictionaryFactory*
> > * business_name_suggest*
> > * id*
> > *text_suggest*
> > * business_name*
> > * linear*
> > * true*
> > * /app/solrnode/suggest_test_1_blendedinfix1*
> > * 0*
> > * true*
> > * true*
> > * false*
> > * *
> >
> > Can someone please suggest on how I can achieve this?
> >
> > Thanks,
> > Manohar
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Issue with Auto Suggester Component

2016-03-15 Thread Manohar Sripada
Consider the below company names indexed. I want the below auto suggestions
to be listed when searched for "abc". Basically, I want prefix search of
entire keyword to be of high preference (#1 to #5 in the below example)
followed by prefix part of any other string (the last 2 in the below
example). I am not bothered about ordering within 1st and 2nd set.

ABC Corporation
ABCD Corporation
Abc Tech
AbCorporation
ABCD company
The ABC Company
The ABCDEF

I am using Suggest feature of solr as mentioned in the wiki
. I used
different Lookup implementations available, but, I couldn't get the result
as above. Here's is one sample config I used with BlendedInfixLookupFactory


 **
* businessNameBlendedInfixSuggester1*
* BlendedInfixLookupFactory*
* DocumentDictionaryFactory*
* business_name_suggest*
* id*
*text_suggest*
* business_name*
* linear*
* true*
* /app/solrnode/suggest_test_1_blendedinfix1*
* 0*
* true*
* true*
* false*
* *

Can someone please suggest on how I can achieve this?

Thanks,
Manohar


Re: Spatial Search on Postal Code

2016-03-07 Thread Manohar Sripada
Thanks Again Emir! I will try this way.

Thanks David! It looks like building of polygons at index time is better
option than at query time.

Thanks,
Manohar

On Sat, Mar 5, 2016 at 7:54 PM, david.w.smi...@gmail.com <
david.w.smi...@gmail.com> wrote:

> Another path to consider is doing this point-in-zipcode-poly lookup at
> index time and enriching the document with a zipcode field (possibly
> multi-valued if there is doubt).
>
> On Sat, Mar 5, 2016 at 4:05 AM steve shepard <sc_shep...@hotmail.com>
> wrote:
>
> > re: Postal Codes and polygons. I've heard of basic techniques that use
> > Commerce Department (or was it Census within Commerce??) that give the
> > basic points, but the real run is deciding what the "center" of that
> > polygon is. There is likely a commercial solution available, and
> certainly
> > you can buy a spreadsheet with the zipcodes and their guestimated center.
> > Fun project!
> >
> > > Subject: Re: Spatial Search on Postal Code
> > > To: solr-user@lucene.apache.org
> > > From: emir.arnauto...@sematext.com
> > > Date: Fri, 4 Mar 2016 21:18:10 +0100
> > >
> > > Hi Manohar,
> > > I don't think there is such functionality in Solr - you need to do it
> on
> > > client side:
> > > 1. find some postal code polygons (you can use open street map -
> > > http://wiki.openstreetmap.org/wiki/Key:postal_code)
> > > 2. create zip to polygon lookup
> > > 3. create code that will expand zip code polygon by some distance (you
> > > can use JTS buffer api)
> > >
> > > On query time you get zip code and distance:
> > > 1. find polygon for zip
> > > 2. expand polygon
> > > 3. send resulting polygon to Solr and use Intersects function to filter
> > > results
> > >
> > > Regards,
> > > Emir
> > >
> > > On 04.03.2016 19:49, Manohar Sripada wrote:
> > > > Thanks Emir,
> > > >
> > > > Obviously #2 approach is much better. I know its not straight
> forward.
> > But,
> > > > is it really acheivable in Solr? Like building a polygon for a postal
> > code.
> > > > If so, can you throw some light how to do?
> > > >
> > > > Thanks,
> > > > Manohar
> > > >
> > > > On Friday, March 4, 2016, Emir Arnautovic <
> > emir.arnauto...@sematext.com>
> > > > wrote:
> > > >
> > > >> Hi Manohar,
> > > >> This depends on your requirements/usecase. If postal code is
> > interpreted
> > > >> as point than it is expected to have radius that is significantly
> > larger
> > > >> than postal code diameter. In such case you can go with first
> > approach. In
> > > >> order to avoid missing results from postal code in case of small
> > search
> > > >> radius and large postal code, you can reverse geocode records and
> > store
> > > >> postal code with each document.
> > > >> If you need to handle distance from postal code precisely - distance
> > from
> > > >> its border, you have to get postal code polygon, expand it by search
> > > >> distance and use resulting polygon to find matches.
> > > >>
> > > >> HTH,
> > > >> Emir
> > > >>
> > > >> On 04.03.2016 13:09, Manohar Sripada wrote:
> > > >>
> > > >>> Here's my requirement -  User enters postal code and provides the
> > radius.
> > > >>> I
> > > >>> need to find the records with in the radius from the provided
> postal
> > code.
> > > >>>
> > > >>> There are few ways I thought through after going through the
> "Spatial
> > > >>> Search" Solr wiki
> > > >>>
> > > >>> 1. As Latitude and Longitude positions are required for spatial
> > search.
> > > >>> Get
> > > >>> Latitude Longitude position (may be using GeoCoding API) of a
> postal
> > code
> > > >>> and use "LatLonType" field type and query accordingly. As the
> > GeoCoding
> > > >>> API
> > > >>> returns one point and if the postal code area is too big, then I
> may
> > end
> > > >>> up
> > > >>> not getting any results (apart from the records from the same
> postal
> > code)
> > > >>> if the radius provided is small.
> > > >>>
> > > >>> 2. Get the latitude longitude points of the postal code which
> forms a
> > > >>> border (not sure yet on how to get) and build a polygon (using
> RPT).
> > While
> > > >>> querying use this polygon and provide the distance. Can this be
> > achieved?
> > > >>> Or Am I ruminating too much? :(
> > > >>>
> > > >>> Appreciate any help on this.
> > > >>>
> > > >>> Thanks
> > > >>>
> > > >>>
> > > >> --
> > > >> Monitoring * Alerting * Anomaly Detection * Centralized Log
> Management
> > > >> Solr & Elasticsearch Support * http://sematext.com/
> > > >>
> > > >>
> > >
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> >
>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>


Re: Spatial Search on Postal Code

2016-03-04 Thread Manohar Sripada
Thank Erik,
Does this zip codes index that you created is a one-to-many
mapping from zip code to Lat/Lon points? If so, where did you get this
mapping CSV file?

Thanks,
Manohar

On Friday, March 4, 2016, Erik Hatcher <erik.hatc...@gmail.com> wrote:

> This is just like an implementation I recently worked on with a customer.
> It’s very much like this sort of thing ;) -
>
> <
> http://nyulangone.org/doctors?query===10019==distance=
> <
> http://nyulangone.org/doctors?query===10019==distance=
> >>
>
> It’s implemented with Solr, leveraging the Lucidworks Fusion query
> pipelines to do these steps:
>
>- =N comes in to the query pipeline
>- a sub-query is made to a separate zipcodes index (built from a CSV
> file of zipcodes and their corresponding representative lat/lon)
>- the matching lat/lon is used to build the appropriate geo-filtering
> and sorting parameters to pass on to the main collection
>
> Straightforward, clean, and effective for the needs.
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com <http://www.lucidworks.com/>
>
>
>
> > On Mar 4, 2016, at 7:09 AM, Manohar Sripada <manohar...@gmail.com
> <javascript:;>> wrote:
> >
> > Here's my requirement -  User enters postal code and provides the
> radius. I
> > need to find the records with in the radius from the provided postal
> code.
> >
> > There are few ways I thought through after going through the "Spatial
> > Search" Solr wiki
> >
> > 1. As Latitude and Longitude positions are required for spatial search.
> Get
> > Latitude Longitude position (may be using GeoCoding API) of a postal code
> > and use "LatLonType" field type and query accordingly. As the GeoCoding
> API
> > returns one point and if the postal code area is too big, then I may end
> up
> > not getting any results (apart from the records from the same postal
> code)
> > if the radius provided is small.
> >
> > 2. Get the latitude longitude points of the postal code which forms a
> > border (not sure yet on how to get) and build a polygon (using RPT).
> While
> > querying use this polygon and provide the distance. Can this be achieved?
> > Or Am I ruminating too much? :(
> >
> > Appreciate any help on this.
> >
> > Thanks
>
>


Re: Spatial Search on Postal Code

2016-03-04 Thread Manohar Sripada
Thanks Emir,

Obviously #2 approach is much better. I know its not straight forward. But,
is it really acheivable in Solr? Like building a polygon for a postal code.
If so, can you throw some light how to do?

Thanks,
Manohar

On Friday, March 4, 2016, Emir Arnautovic <emir.arnauto...@sematext.com>
wrote:

> Hi Manohar,
> This depends on your requirements/usecase. If postal code is interpreted
> as point than it is expected to have radius that is significantly larger
> than postal code diameter. In such case you can go with first approach. In
> order to avoid missing results from postal code in case of small search
> radius and large postal code, you can reverse geocode records and store
> postal code with each document.
> If you need to handle distance from postal code precisely - distance from
> its border, you have to get postal code polygon, expand it by search
> distance and use resulting polygon to find matches.
>
> HTH,
> Emir
>
> On 04.03.2016 13:09, Manohar Sripada wrote:
>
>> Here's my requirement -  User enters postal code and provides the radius.
>> I
>> need to find the records with in the radius from the provided postal code.
>>
>> There are few ways I thought through after going through the "Spatial
>> Search" Solr wiki
>>
>> 1. As Latitude and Longitude positions are required for spatial search.
>> Get
>> Latitude Longitude position (may be using GeoCoding API) of a postal code
>> and use "LatLonType" field type and query accordingly. As the GeoCoding
>> API
>> returns one point and if the postal code area is too big, then I may end
>> up
>> not getting any results (apart from the records from the same postal code)
>> if the radius provided is small.
>>
>> 2. Get the latitude longitude points of the postal code which forms a
>> border (not sure yet on how to get) and build a polygon (using RPT). While
>> querying use this polygon and provide the distance. Can this be achieved?
>> Or Am I ruminating too much? :(
>>
>> Appreciate any help on this.
>>
>> Thanks
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Spatial Search on Postal Code

2016-03-04 Thread Manohar Sripada
Here's my requirement -  User enters postal code and provides the radius. I
need to find the records with in the radius from the provided postal code.

There are few ways I thought through after going through the "Spatial
Search" Solr wiki

1. As Latitude and Longitude positions are required for spatial search. Get
Latitude Longitude position (may be using GeoCoding API) of a postal code
and use "LatLonType" field type and query accordingly. As the GeoCoding API
returns one point and if the postal code area is too big, then I may end up
not getting any results (apart from the records from the same postal code)
if the radius provided is small.

2. Get the latitude longitude points of the postal code which forms a
border (not sure yet on how to get) and build a polygon (using RPT). While
querying use this polygon and provide the distance. Can this be achieved?
Or Am I ruminating too much? :(

Appreciate any help on this.

Thanks


Range Query on a language specific field

2015-11-24 Thread Manohar Sripada
I have a requirement where I need to be able to query on a field (say
"salary"). This field contains data in Chinese.

Is it possible in Solr to do a range query on this field? Is there any
language specific Analyzer that I could use on this field to achieve range
search?


Solr Query taking 50 sec

2015-07-30 Thread Manohar Sripada
Hi,

We have Solr Cloud (version 4.7.2) setup on 64 shards spread across VMs. I
see my queries to Solr taking exactly 50 sec intermittently (as someone
said so :P). This happens once in 10 queries.
I have enabled log level to TRACE on all the solr nodes. I didn't find any
issue with the query time on any given shard (max QTime observed on a shard
is 10 ms).  We ran all the tests related to network and everything looks
fine there.

Whenever the query took 50 sec, I am seeing the below log statements
for org.eclipse.jetty component. Is this some issue with Jetty?? I could
see this logs being printed every 11 seconds(*2015-07-24 07:06:00, **2015-07-24
07:06:11, ...)*  for 4 times. Attached the complete logs during that
duration. Can someone please help me here??


*DEBUG - 2015-07-24 07:06:00.126; org.eclipse.jetty.http.HttpParser; filled
707/707*
*DEBUG - 2015-07-24 07:06:00.127; org.eclipse.jetty.server.Server; REQUEST
/solr/admin/info/logging on
BlockingHttpConnection@7a5f39b0,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-5,l=209,c=0},r=43*
*DEBUG - 2015-07-24 07:06:00.127;
org.eclipse.jetty.server.handler.ContextHandler; scope
null||/solr/admin/info/logging @
o.e.j.w.WebAppContext{/solr,file:/u01/work/app/install_solr/daas_node/solr-webapp/webapp/},/u01/work/app/install_solr/daas_node/webapps/solr.war*
*DEBUG - 2015-07-24 07:06:00.127;
org.eclipse.jetty.server.handler.ContextHandler;
context=/solr||/admin/info/logging @
o.e.j.w.WebAppContext{/solr,file:/u01/work/app/install_solr/daas_node/solr-webapp/webapp/},/u01/work/app/install_solr/daas_node/webapps/solr.war*
*DEBUG - 2015-07-24 07:06:00.127;
org.eclipse.jetty.server.session.SessionHandler; Got Session ID
vZScVxfQ528bXYGHJw16N3vTLJ4t3L41bSkHNmyTywQKGGzZFC8p!-348395136!NONE from
cookie*
*DEBUG - 2015-07-24 07:06:00.127;
org.eclipse.jetty.server.session.SessionHandler;
sessionManager=org.eclipse.jetty.server.session.HashSessionManager@1c49094*
*DEBUG - 2015-07-24 07:06:00.127;
org.eclipse.jetty.server.session.SessionHandler; session=null*
*DEBUG - 2015-07-24 07:06:00.128; org.eclipse.jetty.servlet.ServletHandler;
servlet /solr|/admin/info/logging|null - default*
*DEBUG - 2015-07-24 07:06:00.128; org.eclipse.jetty.servlet.ServletHandler;
chain=SolrRequestFilter-default*
*DEBUG - 2015-07-24 07:06:00.128;
org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter
SolrRequestFilter*
*INFO  - 2015-07-24 07:06:00.128;
org.apache.solr.servlet.SolrDispatchFilter; [admin] webapp=null
path=/admin/info/logging
params={_=1437736005493since=1437734905469wt=json} status=0 QTime=0 *
*DEBUG - 2015-07-24 07:06:00.128;
org.apache.solr.servlet.SolrDispatchFilter; Closing out SolrRequest:
{_=1437736005493since=1437734905469wt=json}*
*DEBUG - 2015-07-24 07:06:00.129; org.eclipse.jetty.server.Server; RESPONSE
/solr/admin/info/logging  200 handled=true*
*DEBUG - 2015-07-24 07:06:06.327;
org.apache.zookeeper.ClientCnxn$SendThread; Got ping response for
sessionid: 0x14eaf8f79530460 after 0ms*
*DEBUG - 2015-07-24 07:06:11.118; org.eclipse.jetty.http.HttpParser; filled
707/707*
*DEBUG - 2015-07-24 07:06:11.119; org.eclipse.jetty.server.Server; REQUEST
/solr/admin/info/logging on
BlockingHttpConnection@7a5f39b0,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-5,l=209,c=0},r=44*
*DEBUG - 2015-07-24 07:06:11.119;
org.eclipse.jetty.server.handler.ContextHandler; scope
null||/solr/admin/info/logging @
o.e.j.w.WebAppContext{/solr,file:/u01/work/app/install_solr/daas_node/solr-webapp/webapp/},/u01/work/app/install_solr/daas_node/webapps/solr.war*
*DEBUG - 2015-07-24 07:06:11.119;
org.eclipse.jetty.server.handler.ContextHandler;
context=/solr||/admin/info/logging @
o.e.j.w.WebAppContext{/solr,file:/u01/work/app/install_solr/daas_node/solr-webapp/webapp/},/u01/work/app/install_solr/daas_node/webapps/solr.war*
*DEBUG - 2015-07-24 07:06:11.119;
org.eclipse.jetty.server.session.SessionHandler; Got Session ID
vZScVxfQ528bXYGHJw16N3vTLJ4t3L41bSkHNmyTywQKGGzZFC8p!-348395136!NONE from
cookie*
*DEBUG - 2015-07-24 07:06:11.119;
org.eclipse.jetty.server.session.SessionHandler;
sessionManager=org.eclipse.jetty.server.session.HashSessionManager@1c49094*
*DEBUG - 2015-07-24 07:06:11.119;
org.eclipse.jetty.server.session.SessionHandler; session=null*
*DEBUG - 2015-07-24 07:06:11.120; org.eclipse.jetty.servlet.ServletHandler;
servlet /solr|/admin/info/logging|null - default*
*DEBUG - 2015-07-24 07:06:11.120; org.eclipse.jetty.servlet.ServletHandler;
chain=SolrRequestFilter-default*
*DEBUG - 2015-07-24 07:06:11.120;
org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter
SolrRequestFilter*
*INFO  - 2015-07-24 07:06:11.120;
org.apache.solr.servlet.SolrDispatchFilter; [admin] webapp=null
path=/admin/info/logging
params={_=1437736016484since=1437734905469wt=json} status=0 QTime=0 *
*DEBUG - 2015-07-24 07:06:11.120;
org.apache.solr.servlet.SolrDispatchFilter; Closing out SolrRequest:
{_=1437736016484since=1437734905469wt=json}*
*DEBUG - 2015-07-24 07:06:11.121; 

How http connections are handled in Solr?

2015-06-03 Thread Manohar Sripada
Hi,

I wanted to know in detail on how it is http connections are handled in
Solr.

1. From my code, I am using CloudSolrServer of solrj client library to get
the connection. From one of my previous discussion in this forum, I
understood that Solr uses Apache's HttpClient for connections and the
default maxConnections per host is 32 and default max connections is 128.


*CloudSolrServer cloudSolrServer = new CloudSolrServer(zookeeper_quorum);*

*cloudSolrServer.connect();*
My first question here is what does this maxConnectionsperHost and
maxConnections imply? Are these the connections from solrj client to the
Zookeeper quorum OR from solrj client to the solr nodes?

2. CloudSolrServer uses LBHttpSolrServer which does send requests in round
robin fashion, i.e., first request to node1, 2nd request to node2 etc. If
the answer to the above question is from solrj client to the solr nodes,
then does the http connection pool to the solr nodes from solrj client will
be created for the first request to a particular solr node during round
robin?

3. Consider in my solr cloud I have one collection with 8 shards spread on
4 solr nodes. My understanding is that solrj client will send a query to
one the solr core ( eg:solr core1) residing in one of the solr node (eg:
node1). The solr core1 is responsible for sending queries to all the 8 Solr
cores of that collection. Once it gets the response from all the solr
cores, it merges the data and returns to the client. In this process, how
the http connections between one solr node and rest of solr nodes are
handled.

Does Solr maintains a connection pool here between Solr nodes? If so, when
the connection pool is created between the Solr nodes?

Thanks,
Manohar


Copying index from one Solr cloud to other Solr cloud

2015-05-27 Thread Manohar Sripada
I am using Solr cloud 4.7.2. We have around 100 collections spread across
16 Solr nodes. Also, there are 5 dedicated servers for running Zookeeper.

I want to move all these collections data (or collections) to a completely
different solr cloud. How to achieve this? The Zookeeper servers for this
Solr cloud are also completely different.


Thanks,
Manohar


Error while creating collection

2015-05-18 Thread Manohar Sripada
I am using 4.7.2 version of Solr. Cluster contains 16 VMs for Solr and 3
Zookeeper VMs. I am creating a bunch of collections (total 54)
sequentially. I am getting the below error during one of the collection
creation. This error is intermittent.

* Could not fully createcollection: collection_name*

Here's the stack trace in solr log related to this.

*ERROR - 2015-05-17 03:44:23.511; org.apache.solr.common.SolrException;
Collection createcollection of createcollection
failed:org.apache.solr.common.SolrException: Could not fully
createcollection: collection_name*
*at
org.apache.solr.cloud.OverseerCollectionProcessor.createCollection(OverseerCollectionProcessor.java:1637)*
*at
org.apache.solr.cloud.OverseerCollectionProcessor.processMessage(OverseerCollectionProcessor.java:387)*
*at
org.apache.solr.cloud.OverseerCollectionProcessor.run(OverseerCollectionProcessor.java:200)*
*at java.lang.Thread.run(Thread.java:662)*

When I tried to delete this collection immediately, I am getting below
error

* Could not find collection: collection_name*

But, I am able to see this collection (without any shards) in Solr Admin
UI. And when I re-tried collection deletion after some time, I was able to
delete this collection. Can some one please help me here?

Thanks,
Manohar


Re: QQ on segments during indexing.

2015-05-13 Thread Manohar Sripada
Thanks Shawn, In my case, the document size is small. So, for sure it will
reach 50k docs first than 100MB buffer size.

Thanks,
Manohar

On Thu, May 14, 2015 at 10:49 AM, Shawn Heisey apa...@elyograg.org wrote:

 On 5/13/2015 10:01 PM, Manohar Sripada wrote:
  I have a question on segment creation on disk during indexing.
 
  In my solrconfig.xml, I have commented maxBufferedDocs and
 ramBufferSizeMB.
  I am controlling the flushing of data to disk using autoCommit's maxDocs
  and maxTime.
 
  Here, maxDocs is set to 5 and will be hit first, so that commit of
 data
  to disk happens every 5 docs. So, my question here is will it create
 a
  new segment when this commit happens?
 
  In the wiki
  https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor, it is
  mentioned that a new segment creation is determined based on
  maxBufferedDocs parameter. As I have commented this parameter, how a new
  segment creation is determined?

 In recent Solr versions, the ramBufferSizeMB setting defaults to 100 and
 maxBufferedDocs defaults to -1.  A setting of -1 on maxBufferedDocs
 means that the number of docs doesn't matter, it will use
 ramBufferSizeMB unless a commit happens before the buffer fills up.  A
 commit does trigger a segment flush, although if it's a soft commit, the
 situation might be more complicated.

 Unless the docs are very small, I would expect a 100MB buffer to fill up
 before you reach 5 docs.  It's been a while since I watched index
 segments get created, but if I remember correctly, the amount of space
 required in the RAM buffer to index documents is more than the size of
 the segment that eventually gets flushed to disk.

 Thanks,
 Shawn




QQ on segments during indexing.

2015-05-13 Thread Manohar Sripada
I have a question on segment creation on disk during indexing.

In my solrconfig.xml, I have commented maxBufferedDocs and ramBufferSizeMB.
I am controlling the flushing of data to disk using autoCommit's maxDocs
and maxTime.

Here, maxDocs is set to 5 and will be hit first, so that commit of data
to disk happens every 5 docs. So, my question here is will it create a
new segment when this commit happens?

In the wiki
https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor, it is
mentioned that a new segment creation is determined based on
maxBufferedDocs parameter. As I have commented this parameter, how a new
segment creation is determined?

Thanks,
Manohar


Re: Question on CloudSolrServer API

2015-02-23 Thread Manohar Sripada
Thanks for the response. How to control the number of connections pooled
here in SolrJ Client? Also, what will be the default values for maximum
Connections and all.

- Thanks

On Thu, Feb 19, 2015 at 6:09 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 No, you should reuse the same CloudSolrServer instance for all requests. It
 is a thread safe object. You could also create a static/common HttpClient
 instance and pass it to the constructor of CloudSolrServer but even if you
 don't, it will create one internally and use it for all requests so that
 connections can be pooled.
 On 19-Feb-2015 1:44 pm, Manohar Sripada manohar...@gmail.com wrote:

  Hi All,
 
  I am using CloudSolrServer API of SolrJ library from my application to
  query Solr. Here, I am creating a new connection to Solr for every search
  that I am doing. Once I got the results I am closing the connection.
 
  Is this the correct way? How does Solr create connections internally?
 Does
  it maintain a pool of connections (if so how to configure it)?
 
  Thanks,
  Manohar
 



Question on CloudSolrServer API

2015-02-19 Thread Manohar Sripada
Hi All,

I am using CloudSolrServer API of SolrJ library from my application to
query Solr. Here, I am creating a new connection to Solr for every search
that I am doing. Once I got the results I am closing the connection.

Is this the correct way? How does Solr create connections internally? Does
it maintain a pool of connections (if so how to configure it)?

Thanks,
Manohar


Collection timeout error

2015-01-19 Thread Manohar Sripada
I am getting collection timeout error while creating collection on Solr
Cloud. Below is the error.

 org.apache.solr.common.SolrException: createcollection the collection
time out:180s
 at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:252)
 at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:233)


This SolrCloud was up and running from 2 weeks. Prior to this, I have
created/deleted/reloaded collections on this SolrCloud. Suddenly, I started
getting this error while trying to create a collection. I am using external
Zookeeper ensemble.

When I tried to connect to Zookeeper ensemble through command line. Below
is something I found out.

 ls /overseer/collection-queue-work

[qn-30, qn-20, qn-26, qn-18, qn-28,
qn-16, qn-22, qn-14, qn-24, qn-12]

 get /overseer/collection-queue-work/qn-12

{

  operation:createcollection,

  fromApi:true,

  name:coll_4,

  replicationFactor:2,

  collection.configName:collConfig,

  numShards:16,

  maxShardsPerNode:4}


There are many such requests which were in queue. All are related to
Collection API requests (eg: CREATE, DELETE, RELOAD etc). Can anyone please
tell me why Solr/ZK went into this state only for collection related APIs?
Although restart of Solr and Zookeeper worked, but, I can't ask for a
restart in production whenever this occurs. So, wanted to find the root
cause.



Thanks


Re: How to select the correct number of Shards in SolrCloud

2015-01-16 Thread Manohar Sripada
Thanks Daniel and Shawn for your valuable suggestions,

Daniel,
If you have a query and it needs to get results from 64 cores, if 63 return
in 100ms but the last core is in GC pause and takes 500ms, your query will
take just over 500ms.
 There is only single JVM running per machine. I will get the QTime from
each Solr Core and will check if this is the root cause.

Lastly, you mentioned you allocated 32Gb to solr, do you mean to the
JVM heap?
That's quite a lot of a 64Gb machine, you haven't left much for the page
cache.
 Yes, 32GB to Solr's JVM heap. I wanted to enable Filter  FieldValue
Cache, as most of my search queries revolves around filters and facets.
Also, I am planning to use Document cache.

Shawn,
Each server has 8 CPU cores and 64GB of RAM.  Solr requires a 6GB heap
 Can you please tell me what is the size of your index? And what is the
size of the large cold shard?
 Can you please suggest if any tool that you use for collecting the
statistics? like the QTime's for the queries etc.

Thanks,
Manohar


On Fri, Jan 16, 2015 at 3:23 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 1/15/2015 10:58 PM, Manohar Sripada wrote:
  The reason I have created 64 Shards is there are 4 CPU cores on each VM;
  while querying I can make use of all the CPU cores. On an average, Solr
  QTime is around 500ms here.
 
  Last time to my other discussion, Erick suggested that I might be over
  sharding, So, I tried reducing the number of shards to 32 and then 16. To
  my surprise, it started performing better. It came down to 300 ms (for 32
  shards) and 100 ms (for 16 shards). I haven't tested with filters and
  facets yet here. But, the simple search queries had shown lot of
  improvement.
 
  So, how come the less number of shards performing better?? Is it because
  there are less number of posting lists to search on OR less merges that
 are
  happening? And how to determine the correct number of shards?

 Daniel has replied with good information.

 One additional problem I can think of when there are too many shards: If
 your Solr server is busy enough to have any possibility of simultaneous
 requests, then you will find that it's NOT a good idea to create enough
 shards to use all your CPU cores.  In that situation, when you do a
 single query, all your CPU cores will be in use.  When multiple queries
 happen at the same time, they have to share the available CPU resources,
 slowing them down.  With a smaller number of shards, the additional CPU
 cores can handle simultaneous queries.

 I have an index with nearly 100 million documents.  I've divided it into
 six large cold shards and one very small hot shard.  It's not SolrCloud.
  I put three large shards on each of two servers, and the small shard on
 one of those two servers.  The distributed query normally happens on the
 server without the small shard.  Each server has 8 CPU cores and 64GB of
 RAM.  Solr requires a 6GB heap.

 My median QTime over the last 231836 queries is 25 milliseconds and my
 95th percentile QTime is 376 milliseconds.  My query rate is pretty low
 - I've never seen Solr's statistics for the 15 minute query rate go
 above a single digit per second.

 Thanks,
 Shawn




How to select the correct number of Shards in SolrCloud

2015-01-15 Thread Manohar Sripada
Hi All,

My Setup is as follows. There are 16 nodes in my SolrCloud and 4 CPU cores
on each Solr Node VM. Each having 64 GB of RAM, out of which I have
allocated 32 GB to Solr. I have a collection which contains around 100
million Docs, which I created with 64 shards, replication factor 2, and 8
shards per node. Each shard is getting around 1.6 Million Documents.

The reason I have created 64 Shards is there are 4 CPU cores on each VM;
while querying I can make use of all the CPU cores. On an average, Solr
QTime is around 500ms here.

Last time to my other discussion, Erick suggested that I might be over
sharding, So, I tried reducing the number of shards to 32 and then 16. To
my surprise, it started performing better. It came down to 300 ms (for 32
shards) and 100 ms (for 16 shards). I haven't tested with filters and
facets yet here. But, the simple search queries had shown lot of
improvement.

So, how come the less number of shards performing better?? Is it because
there are less number of posting lists to search on OR less merges that are
happening? And how to determine the correct number of shards?

Thanks,
Manohar


Re: Loading data to FieldValueCache

2014-12-28 Thread Manohar Sripada
Erick,

I am trying to do a premature optimization. *There will be no updates to my
index. So, no worries about ageing out or garbage collection.*
Let me get my understanding correctly; when we talk about filterCache, it
just stores the document IDs in the cache right?

And my setup is as follows. There are 16 nodes in my SolrCloud. Each having
64 GB of RAM, out of which I am allocating 45 GB to Solr. I have a
collection (say Products, which contains around 100 million Docs), which I
created with 64 shards, replication factor 2, and 8 shards per node. Each
shard is getting around 1.6 Million Documents. So my math here for
filterCache for a specific filter will be -


   - an average filter query will be 20 bytes, so 1000 (distinct number of
   states) x 20 = 2 MB
   - and considering union of DocIds for all the values of a given filter
   equals to total number of DocId's present in the index. There are 1.6
   Million Documents in a  solr core. So, 1,600,000 x 8 Bytes (for each Doc
   Id) equals to 12.8 MB
   - There will be 8 solrcores per node - 8 x 12.8 MB = *102 MB. *

This is the size of cache for a single filter in a single node. Considering
the heapsize I have given, I think this shouldn't be an issue..

Thanks,
Manohar

On Fri, Dec 26, 2014 at 10:56 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Manohar:

 Please approach this cautiously. You state that you have hundreds of
 states.
 Every 100 states will use roughly 1.2G of your filter cache. Just for this
 field. Plus it'll fill up the cache and they may soon be aged out anyway.
 Can you really afford the space? Is it really a problem that needs to be
 solved at this point? This _really_ sounds like premature optimization
 to me as you haven't
 demonstrated that there's an actual problem you're solving.

 OTOH, of course, if you're experimenting to better understand all the
 ins and outs
 of the process that's another thing entirely ;)

 Toke:

 I don't know the complete algorithm, but if the number of docs that
 satisfy the fq is small enough,
 then just the internal Lucene doc IDs are stored rather than a bitset.
 What exactly small enough is
 I don't know off the top of my head. And I've got to assume looking
 stuff up in a list is slower than
 indexing into a bitset so I suspect small enough is very small

 On Fri, Dec 26, 2014 at 3:00 AM, Manohar Sripada manohar...@gmail.com
 wrote:
  Thanks Toke for the explanation, I will experiment with
  f.state.facet.method=enum
 
  Thanks,
  Manohar
 
  On Fri, Dec 26, 2014 at 4:09 PM, Toke Eskildsen t...@statsbiblioteket.dk
  wrote:
 
  Manohar Sripada [manohar...@gmail.com] wrote:
   I have 100 million documents in my index. The maxDoc here is the
 maximum
   Documents in each shard, right? How is it determined that each entry
 will
   occupy maxDoc/8 approximately.
 
  Assuming that it is random whether a document is part of the result set
 or
  not, the most efficient representation is 1 bit/doc (this is often
 called a
  bitmap or bitset). So the total number of bits will be maxDoc, which is
 the
  same as maxDoc/8 bytes.
 
  Of course, result sets are rarely random, so it is possible to have
 other
  and more compact representations. I do not know how that plays out in
  Lucene. Hopefully somebody else can help here.
 
   If I have to add facet.method=enum every time in the query, how
 should I
   specify for each field separately?
 
  f.state.facet.method=enum
 
  See https://wiki.apache.org/solr/SimpleFacetParameters#Parameters
 
  - Toke Eskildsen
 



Re: Loading data to FieldValueCache

2014-12-26 Thread Manohar Sripada
I have 100 million documents in my index. The maxDoc here is the maximum
Documents in each shard, right? How is it determined that each entry will
occupy maxDoc/8 approximately.

If I have to add facet.method=enum every time in the query, how should I
specify for each field separately? Like in the above example, I am planning
to use products facet with facet.methed=fc and state facet with
facet.method=enum. How do I specify different facet methods for different
fields while trying to get both of these facets.

Thanks,
Manohar

On Thu, Dec 25, 2014 at 2:52 AM, Erick Erickson erickerick...@gmail.com
wrote:

 Inline.

 On Tue, Dec 23, 2014 at 11:12 PM, Manohar Sripada manohar...@gmail.com
 wrote:
  Okay. Let me try like this, as mine is a read-only index. I will have
 some
  queries in firstSearcher event listener
  1) q=*:*facet=truefacet.method=enumfacet.field=state   -- To load all
  the state related unique values to filterCache.

 It's not necessary to use facet.method=enum here at all, just facet on the
 field
 and trust the heuristics built in. If you insist on this be very sure you
 can
 afford the space.

  Will it use filterCache when I sent a query with filter, eg:
  fq=state:CA ??

 Don't know. Try it and look on admin/stats for the filter cache. You'll
 see a new insert if it does not use the one already there.


  Once it is loaded, Do I need to sent a query with facet.method=enum
  every time along with facet.field=state to get state related facet data
  from filterCache?

 See above. You haven't told us how many docs in your index, so we
 have no way of estimating how much this'll cost you. Each entry
 will be maxDoc/8 roughly, and you'll have about 50 of them.

 Yes, though, if you take control of the facet.method you'll have to
 add it every time.

 
  2) q=*:*facet=truefacet.method=fcfacet.field=products  -- To load the
  values related to products to fieldCache.
   Again, while querying for this facet do I need to sent
  facet.method=fc every time?
 See above.

 
  Thanks,
  Manohar
 
  On Wed, Dec 24, 2014 at 11:36 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  By and large, don't use the enum method unless there are _very_
  few unique values. It forms a filter (size roughly mixDoc/8 bytes)
  for _every_ unique value in the field, i.e. if you have 10,000 unique
  values it'll try to form 10,000 filterCache entries. Let the system
  do this for you automatically if appropriate.
 
  Best,
  Erick
 
  On Tue, Dec 23, 2014 at 9:37 PM, Manohar Sripada manohar...@gmail.com
  wrote:
   Thanks Erick and Toke,
  
   Also, I read here 
 https://wiki.apache.org/solr/SolrCaching#filterCache
  that,
   filterCache can also be used for faceting with facet.method=enum. So,
 I
  am
   bit confused here on which one to use for faceting.
  
   One more thing here is I have different types of facets. (For example
 -
   Product List, States). The Product List facet has lot many unique
 values
   (around 10 million), where as States list will be in hundreds. So, I
 want
   to come up with the numbers for size of fieldValueCache/filterCache
 and
   pre-populate this.
  
   Thanks,
   Manohar
  
   On Tue, Dec 23, 2014 at 10:07 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
   Or just not worry about it. The cache will be filled up automatically
   as you query for facets etc., the benefit to trying to fill it up as
   Toke outlines is just that the first few user queries that call for
   faceting will be somewhat faster. But after the first few user
   queries have gone through, it won't matter whether you've
   pre-loaded the cache or not.
  
   My point is that you'll get the benefit of the cache no matter what,
   it's just a matter of whether it's important that the first few users
   don't have to wait while they're loaded. And with DocValues,
   as Toke recommends, even that may be unimportant.
  
   Best,
   Erick
  
   On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen 
 t...@statsbiblioteket.dk
  
   wrote:
Manohar Sripada [manohar...@gmail.com] wrote:
From the wiki, it states that
http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly
  used
   for
faceting.
   
Can someone please throw some light on how to load data to this
  cache.
   Like
on what solrquery option does this consider the data to be loaded
 to
   this
cache.
   
The values are loaded on first facet call with facet.method=fc.
http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
   
My requirement is I have 10 facet fields (with facetlimit - 5) to
 be
   shown
in my UI. I want to speed up this by using this cache. Is there a
 way
   where
I can specify only the list of fields to be loaded to FieldValue
  Cache?
   
Add a facet call as explicit warmup in your solrconfig.xml.
   
You might want to consider DocValues for your facet fields.
https://cwiki.apache.org/confluence/display/solr/DocValues
   
- Toke Eskildsen
  
 



Re: Loading data to FieldValueCache

2014-12-26 Thread Manohar Sripada
Thanks Toke for the explanation, I will experiment with
f.state.facet.method=enum

Thanks,
Manohar

On Fri, Dec 26, 2014 at 4:09 PM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 Manohar Sripada [manohar...@gmail.com] wrote:
  I have 100 million documents in my index. The maxDoc here is the maximum
  Documents in each shard, right? How is it determined that each entry will
  occupy maxDoc/8 approximately.

 Assuming that it is random whether a document is part of the result set or
 not, the most efficient representation is 1 bit/doc (this is often called a
 bitmap or bitset). So the total number of bits will be maxDoc, which is the
 same as maxDoc/8 bytes.

 Of course, result sets are rarely random, so it is possible to have other
 and more compact representations. I do not know how that plays out in
 Lucene. Hopefully somebody else can help here.

  If I have to add facet.method=enum every time in the query, how should I
  specify for each field separately?

 f.state.facet.method=enum

 See https://wiki.apache.org/solr/SimpleFacetParameters#Parameters

 - Toke Eskildsen



Loading data to FieldValueCache

2014-12-23 Thread Manohar Sripada
Hello,

From the wiki, it states that
http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used for
faceting.

Can someone please throw some light on how to load data to this cache. Like
on what solrquery option does this consider the data to be loaded to this
cache.

My requirement is I have 10 facet fields (with facetlimit - 5) to be shown
in my UI. I want to speed up this by using this cache. Is there a way where
I can specify only the list of fields to be loaded to FieldValue Cache?

Thanks,
Manohar


Re: Loading data to FieldValueCache

2014-12-23 Thread Manohar Sripada
Thanks Erick and Toke,

Also, I read here https://wiki.apache.org/solr/SolrCaching#filterCache that,
filterCache can also be used for faceting with facet.method=enum. So, I am
bit confused here on which one to use for faceting.

One more thing here is I have different types of facets. (For example -
Product List, States). The Product List facet has lot many unique values
(around 10 million), where as States list will be in hundreds. So, I want
to come up with the numbers for size of fieldValueCache/filterCache and
pre-populate this.

Thanks,
Manohar

On Tue, Dec 23, 2014 at 10:07 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Or just not worry about it. The cache will be filled up automatically
 as you query for facets etc., the benefit to trying to fill it up as
 Toke outlines is just that the first few user queries that call for
 faceting will be somewhat faster. But after the first few user
 queries have gone through, it won't matter whether you've
 pre-loaded the cache or not.

 My point is that you'll get the benefit of the cache no matter what,
 it's just a matter of whether it's important that the first few users
 don't have to wait while they're loaded. And with DocValues,
 as Toke recommends, even that may be unimportant.

 Best,
 Erick

 On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:
  Manohar Sripada [manohar...@gmail.com] wrote:
  From the wiki, it states that
  http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used
 for
  faceting.
 
  Can someone please throw some light on how to load data to this cache.
 Like
  on what solrquery option does this consider the data to be loaded to
 this
  cache.
 
  The values are loaded on first facet call with facet.method=fc.
  http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
 
  My requirement is I have 10 facet fields (with facetlimit - 5) to be
 shown
  in my UI. I want to speed up this by using this cache. Is there a way
 where
  I can specify only the list of fields to be loaded to FieldValue Cache?
 
  Add a facet call as explicit warmup in your solrconfig.xml.
 
  You might want to consider DocValues for your facet fields.
  https://cwiki.apache.org/confluence/display/solr/DocValues
 
  - Toke Eskildsen



Re: Loading data to FieldValueCache

2014-12-23 Thread Manohar Sripada
Okay. Let me try like this, as mine is a read-only index. I will have some
queries in firstSearcher event listener
1) q=*:*facet=truefacet.method=enumfacet.field=state   -- To load all
the state related unique values to filterCache.
Will it use filterCache when I sent a query with filter, eg:
fq=state:CA ??
Once it is loaded, Do I need to sent a query with facet.method=enum
every time along with facet.field=state to get state related facet data
from filterCache?

2) q=*:*facet=truefacet.method=fcfacet.field=products  -- To load the
values related to products to fieldCache.
 Again, while querying for this facet do I need to sent
facet.method=fc every time?

Thanks,
Manohar

On Wed, Dec 24, 2014 at 11:36 AM, Erick Erickson erickerick...@gmail.com
wrote:

 By and large, don't use the enum method unless there are _very_
 few unique values. It forms a filter (size roughly mixDoc/8 bytes)
 for _every_ unique value in the field, i.e. if you have 10,000 unique
 values it'll try to form 10,000 filterCache entries. Let the system
 do this for you automatically if appropriate.

 Best,
 Erick

 On Tue, Dec 23, 2014 at 9:37 PM, Manohar Sripada manohar...@gmail.com
 wrote:
  Thanks Erick and Toke,
 
  Also, I read here https://wiki.apache.org/solr/SolrCaching#filterCache
 that,
  filterCache can also be used for faceting with facet.method=enum. So, I
 am
  bit confused here on which one to use for faceting.
 
  One more thing here is I have different types of facets. (For example -
  Product List, States). The Product List facet has lot many unique values
  (around 10 million), where as States list will be in hundreds. So, I want
  to come up with the numbers for size of fieldValueCache/filterCache and
  pre-populate this.
 
  Thanks,
  Manohar
 
  On Tue, Dec 23, 2014 at 10:07 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  Or just not worry about it. The cache will be filled up automatically
  as you query for facets etc., the benefit to trying to fill it up as
  Toke outlines is just that the first few user queries that call for
  faceting will be somewhat faster. But after the first few user
  queries have gone through, it won't matter whether you've
  pre-loaded the cache or not.
 
  My point is that you'll get the benefit of the cache no matter what,
  it's just a matter of whether it's important that the first few users
  don't have to wait while they're loaded. And with DocValues,
  as Toke recommends, even that may be unimportant.
 
  Best,
  Erick
 
  On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen t...@statsbiblioteket.dk
 
  wrote:
   Manohar Sripada [manohar...@gmail.com] wrote:
   From the wiki, it states that
   http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly
 used
  for
   faceting.
  
   Can someone please throw some light on how to load data to this
 cache.
  Like
   on what solrquery option does this consider the data to be loaded to
  this
   cache.
  
   The values are loaded on first facet call with facet.method=fc.
   http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
  
   My requirement is I have 10 facet fields (with facetlimit - 5) to be
  shown
   in my UI. I want to speed up this by using this cache. Is there a way
  where
   I can specify only the list of fields to be loaded to FieldValue
 Cache?
  
   Add a facet call as explicit warmup in your solrconfig.xml.
  
   You might want to consider DocValues for your facet fields.
   https://cwiki.apache.org/confluence/display/solr/DocValues
  
   - Toke Eskildsen
 



Re: Question on Solr Caching

2014-12-08 Thread Manohar Sripada
Thanks Shawn,

Can you please re-direct me to any wiki which describes (in detail) the
differences between MMapDirectoryFactory and NRTCachingDirectoryFactory? I
found this blog
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html very
helpful which describes about MMapDirectory. I want to know in detail about
NRTCachingFactory as well.

Also, when I ran this rest request solr/admin/cores?action=STATUS, I got
the below result (pasted partial result only). I have set the
DirectoryFactory as NRTCachingDirectory in solrconfig.xml. But, it also
shows MMapDirectory in the below element. Does this means
NRTCachingDirectory is using MMapDirectory internally??

str name=directory
org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/instance/solr/collection1_shard2_replica1/data/index
lockFactory=NativeFSLockFactory@/instance/solr/collection1_shard2_replica1/data/index;
maxCacheMB=48.0 maxMergeSizeMB=4.0)/str

What does maxCacheMB and maxMergeSizeMB indicate? How to control it?


Thanks,
Manohar

On Fri, Dec 5, 2014 at 11:04 AM, Shawn Heisey apa...@elyograg.org wrote:

 On 12/4/2014 10:06 PM, Manohar Sripada wrote:
  If you use MMapDirectory, Lucene will map the files into memory off heap
  and the OS's disk cache will cache the files in memory for you. Don't use
  RAMDirectory, it's not better than MMapDirectory for any use I'm aware
 of.
 
  Will that mean it will cache the Inverted index as well to OS disk's
  cache? The reason I am asking is, Solr searches this Inverted Index first
  to get the data. How about if we can keep this in memory?

 If you have enough memory, the operating system will cache *everything*.
  It does so by simply loading the data that's on the disk into RAM ...
 it is not aware that certain parts are the inverted index, it simply
 caches whatever data gets read.  A subsequent read will come out of
 memory, the disk heads will never even move.  If certain data in the
 index is never accessed, then it will not get cached.

 http://en.wikipedia.org/wiki/Page_cache

 Thanks,
 Shawn




Clearing SolrCaches

2014-12-08 Thread Manohar Sripada
Hi,

Can anyone please let me know on how to clear caches associated with an
IndexSearcher explicitly?

In my project, I am creating a collection (say collection_1) which holds
the data for my organizations dataset. I am using filterCache,
queryResultCache and DocumentCache extensively and these are all loaded
through some EvenListeners and/or over a period of time.

Periodically, I get new dataset of organizations. For this new dataset, I
am creating a new collection (say collection_2).

Now, my client searches only on collection_2 not on collection_1 any more.
But, I am still keeping my collection_1 as a backup in SolrCloud. *Here*, I
don't want the cache to be hold by IndexSearcher of all the cores of
collection_1 any more, as no one uses this. How to clear this cache?

*Note: *I am not using my collection for real-time updates. The index is
created from bulk data and there will be only one commit on data.

Thanks,
Manohar


Re: Clearing SolrCaches

2014-12-08 Thread Manohar Sripada
How to edit the configuration that is linked to a collection?? I am using
SolrCloud and I upload my config to Zookeeper. So, if I modify and upload
the config, will that not impact the latest collection as well, if I don't
reload the latest collection?

Thanks,
Manohar

On Mon, Dec 8, 2014 at 7:45 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 12/8/2014 3:02 AM, Manohar Sripada wrote:
  Can anyone please let me know on how to clear caches associated with an
  IndexSearcher explicitly?
 
  In my project, I am creating a collection (say collection_1) which holds
  the data for my organizations dataset. I am using filterCache,
  queryResultCache and DocumentCache extensively and these are all loaded
  through some EvenListeners and/or over a period of time.
 
  Periodically, I get new dataset of organizations. For this new dataset, I
  am creating a new collection (say collection_2).
 
  Now, my client searches only on collection_2 not on collection_1 any
 more.
  But, I am still keeping my collection_1 as a backup in SolrCloud.
 *Here*, I
  don't want the cache to be hold by IndexSearcher of all the cores of
  collection_1 any more, as no one uses this. How to clear this cache?

 I don't think there is any way to do EXACTLY what you are asking.

 If you reload the collection (or each of its cores), then the cache will
 be cleared and re-created, and will only contain data queried by the
 firstSearcher event listener.  If you edit the configuration linked to
 that collection (or link another suitable config, assuming SolrCloud) to
 remove the cache configurations and the event listeners, then there will
 be no cache when you reload it.

 Thanks,
 Shawn




Question on Solr Caching

2014-12-04 Thread Manohar Sripada
Hi,

I am working on implementing Solr in my product. I have a few questions on
caching.

1. Does posting-list and term-list of the index reside in the memory? If
not, how to load this to memory. I don't want to load entire data, like
using DocumentCache. Either I want to use RAMDirectoryFactory as the data
will be lost if you restart

2. For FilterCache, there is a way to specify whether the filter should be
cached or not in the query. Similarly, Is there a way where I can specify
the list of stored fields to be loaded to Document Cache? I know Document
Cache is not associated to query. Just curious to know.

3. Similarly, Is there a way I can specify list of fields to be cached for
FieldCache?

Thanks,
Manohar


Re: Question on Solr Caching

2014-12-04 Thread Manohar Sripada
Thanks Micheal for the response.

If you use MMapDirectory, Lucene will map the files into memory off heap
and the OS's disk cache will cache the files in memory for you. Don't use
RAMDirectory, it's not better than MMapDirectory for any use I'm aware of.

 Will that mean it will cache the Inverted index as well to OS disk's
cache? The reason I am asking is, Solr searches this Inverted Index first
to get the data. How about if we can keep this in memory?

Thanks,
Manohar



On Thu, Dec 4, 2014 at 10:54 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Hi, Manohar,

  1. Does posting-list and term-list of the index reside in the memory? If

 not, how to load this to memory. I don't want to load entire data, like
 using DocumentCache. Either I want to use RAMDirectoryFactory as the data
 will be lost if you restart


 If you use MMapDirectory, Lucene will map the files into memory off heap
 and the OS's disk cache will cache the files in memory for you. Don't use
 RAMDirectory, it's not better than MMapDirectory for any use I'm aware of.

  2. For FilterCache, there is a way to specify whether the filter should
 be cached or not in the query.

 If you add {!cache=false}  to your filter query, it will bypass the cache.
 I'm fairly certain it will not subsequently be cached.

  Similarly, Is there a way where I can specify the list of stored fields
 to be loaded to Document Cache?

 If you have lazy loading enabled, the DocumentCache will only have the
 fields you asked for in it.

  3. Similarly, Is there a way I can specify list of fields to be cached
 for FieldCache? Thanks, Manohar

 You basically don't have much control over the FieldCache in Solr other
 than warming it with queries.

 You should check out this wiki page, it will probably answer some
 questions:

 https://wiki.apache.org/solr/SolrCaching

 I hope that helps!

 Michael