Limit the documents for each shard in solr cloud

2015-05-06 Thread Jilani Shaik
Hi,

Is it possible to restrict number of documents per shard in Solr cloud?

Lets say we have Solr cloud with 4 nodes, and on each node we have one
leader and one replica. Like wise total we have 8 shards that includes
replicas. Now I need to index my documents in such a way that each shard
will have only 5 million documents. Total documents in Solr cloud should be
20 million documents.


Thanks,
Jilani


Re: Union and intersection methods in solr DocSet

2015-05-06 Thread Gajendra Dadheech
Hey Chris,

Thanks for reply.

The exception is ArrayIndexOutOfBound. It is coming because searcher may
return bitDocSet for query1 and sortedIntDocSet for query2 [could be
possible]. In that case, sortedIntDocSet doesn't implement intersection and
will cause this exception.


Thanks and regards,
Gajendra Dadheech


On Thu, May 7, 2015 at 6:06 AM, Chris Hostetter 
wrote:

>
> : DocSet docset1 = Searcher.getDocSet(query1)
> : DocSet docset2 = Searcher.getDocSet(query2);
> :
> : Docset finalDocset = docset1.intersection(docset2);
> :
> : Is this a valid approach ? Give docset could either be a sortedintdocset
> or
> : a bitdocset. I am facing ArrayIndexOutOfBoundException when
> : union/intersected between different kind of docsets.
>
> as far as i know, that should be a totally valid usage -- since you didn't
> provide the details of the stack trace or the code you wrote that
> produced it it's hard to guess why/where it's causing the exception.
>
> FWIW: SolrIndexSearcher has getDocSet methods that take multiple arguments
> which might be more efficient then doing the intersection directly (and
> are cache aware)
>
> if all you care about is the *size* of the intersection, see the
> SolrIndexSearcher.numDocs methods.
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Union and intersection methods in solr DocSet

2015-05-06 Thread Chris Hostetter

: DocSet docset1 = Searcher.getDocSet(query1)
: DocSet docset2 = Searcher.getDocSet(query2);
: 
: Docset finalDocset = docset1.intersection(docset2);
: 
: Is this a valid approach ? Give docset could either be a sortedintdocset or
: a bitdocset. I am facing ArrayIndexOutOfBoundException when
: union/intersected between different kind of docsets.

as far as i know, that should be a totally valid usage -- since you didn't 
provide the details of the stack trace or the code you wrote that 
produced it it's hard to guess why/where it's causing the exception.

FWIW: SolrIndexSearcher has getDocSet methods that take multiple arguments 
which might be more efficient then doing the intersection directly (and 
are cache aware)

if all you care about is the *size* of the intersection, see the 
SolrIndexSearcher.numDocs methods.

-Hoss
http://www.lucidworks.com/


Re: A defect in Schema API with Add a New Copy Field Rule?

2015-05-06 Thread Yonik Seeley
On Wed, May 6, 2015 at 8:10 PM, Steve Rowe  wrote:
> It’s by design that you can copyField the same source/dest multiple times - 
> according to Yonik (not sure where this was discussed), this capability has 
> been used in the past to effectively boost terms in the source field.

Yep, used to be relatively common.
Perhaps the API could be cleaner though if we supported that by
passing an optional "numTimes" or "numCopies"?  Seems like a sane
delete / overwrite options would thus be easier?

-Yonik


Re: A defect in Schema API with Add a New Copy Field Rule?

2015-05-06 Thread Steve Rowe
Hi Steve,

It’s by design that you can copyField the same source/dest multiple times - 
according to Yonik (not sure where this was discussed), this capability has 
been used in the past to effectively boost terms in the source field.  

The API isn’t symmetric here though: I’m guessing deleting a mutiply specified 
copy field rule will delete all of them, but this isn’t tested, so I’m not sure.

There is no replace-copy-field command because copy field rules don’t have 
dependencies (i.e., nothing else in the schema refers to copy field rules), 
unlike fields, dynamic fields and field types, so 
delete-copy-field/add-copy-field works as one would expect.

For fields, dynamic fields and field types, a delete followed by an add is not 
the same as a replace, since (dynamic) fields could have dependent copyFields, 
and field types could have dependent (dynamic) fields.  delete-* commands are 
designed to fail if there are any existing dependencies, while the replace-* 
commands will maintain the dependencies if they exist.

Steve

> On May 6, 2015, at 6:44 PM, Steven White  wrote:
> 
> Hi Everyone,
> 
> I am using the Schema API to add a new copy field per:
> https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-AddaNewCopyFieldRule
> 
> Unlike the other "Add" APIs, this one will not fail if you add an existing
> copy field object.  In fact, after when I call the API over and over, the
> item will appear over and over in schema.xml file like so:
> 
>  
>  
>  
>  
> 
> Is this the expected behaviour or a bug?  As a side question, is there any
> harm in having multiple "copyField" like I ended up with?
> 
> A final question, why there is no Replace a Copy Field?  Is this by design
> for some limitation or was the API just never implemented?
> 
> Thanks
> 
> Steve



Re: solr 3.6.2 under tomcat 8 missing corename in path

2015-05-06 Thread Shawn Heisey
On 5/6/2015 2:29 PM, Tim Dunphy wrote:
> I'm trying to setup an old version of Solr for one of our drupal
> developers. Apparently only versions 1.x or 3.x will work with the current
> version of drupal.
> 
> I'm setting up solr 3.4.2 under tomcat.
> 
> And I'm getting this error when I start tomcat and surf to the /solr/admin
> URL:
> 
>  HTTP Status 404 - missing core name in path
> 
> type Status report
> 
> message missing core name in path
> 
> description The requested resource is not available.

The URL must include the core name.  Your defaultCoreName is
collection1, and I'm guessing you don't have a core named collection1.

Try browsing to just /solr instead of /solr/admin ... you should get a
list of links for valid cores, each of which will take you to the admin
page for that core.

Probably what you will find is that when you click on one of those
links, you will end up on /solr/corename/admin.jsp as the URL in your
browser.

Thanks,
Shawn



5.1.0 Heatmap + Geotools

2015-05-06 Thread Joseph Obernberger
Hi - I'm very interested in the new heat map capability of Solr 5.1.0.  
Has anyone looked at combining geotool's HeatmapProcess method with this 
data?  I'm trying this now, but I keep getting an empty image from the 
GridCoverage2D object.

Any pointers/tips?
Thank you!

-Joe


A defect in Schema API with Add a New Copy Field Rule?

2015-05-06 Thread Steven White
Hi Everyone,

I am using the Schema API to add a new copy field per:
https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-AddaNewCopyFieldRule

Unlike the other "Add" APIs, this one will not fail if you add an existing
copy field object.  In fact, after when I call the API over and over, the
item will appear over and over in schema.xml file like so:

  
  
  
  

Is this the expected behaviour or a bug?  As a side question, is there any
harm in having multiple "copyField" like I ended up with?

A final question, why there is no Replace a Copy Field?  Is this by design
for some limitation or was the API just never implemented?

Thanks

Steve


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread Erick Erickson
Have you seen this? I tried to make something end-to-end with assorted
"gotchas" identified

 Best,
Erick

On Wed, May 6, 2015 at 3:09 PM, O. Olson  wrote:
> Thank you Rajesh. I think I got a bit of help from the answer at:
> http://stackoverflow.com/a/29743945
>
> While that example sort of worked for me, I'm not had the time to test what
> works and what didn't.
>
> So far I have found that I need the the field in my searchComponent to be of
> type 'string'. In my original example I had this as text_general. Next I
> used the suggest_string fieldType as defined in the StackOverflow answer. I
> also removed your queryConverter, and it still works, so I think it's not
> needed.
>
> Thank you very much,
> O. O.
>
>
>
> Rajesh Hazari wrote
>> I just tested your config with my schema and it worked.
>>
>> my config :
>>
>> 
>>
>> 
>>
>> 
>> suggest
>> 
>>
>> 
>> org.apache.solr.spelling.suggest.Suggester
>> 
>>
>> > name="lookupImpl">
>> org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
>> 
>>
>> 
>> textSuggest
>> 
>>
>> 
>> 0.005
>> 
>>
>> 
>> true
>> 
>>
>> 
>> text_general
>> 
>>
>> 
>> true
>> 
>>
>> 
>>
>> 
>>
>> > class="org.apache.solr.spelling.SuggestQueryConverter"/>
>>
>> > name="/suggest1">
>>
>> 
>>
>> 
>> true
>> 
>>
>> 
>> suggest
>> 
>>
>> 
>> true
>> 
>>
>> 
>> 5
>> 
>>
>> 
>> true
>> 
>>
>> 
>>
>> 
>>
>> 
>> suggest1
>> 
>>
>> 
>>
>> 
>>
>> http://localhost:8585/solr/collection1/suggest1?q=apple&rows=10&wt=json&indent=true
>>
>> {
>>   "responseHeader":{
>> "status":0,
>> "QTime":2},
>>   "spellcheck":{
>> "suggestions":[
>>   "apple",{
>> "numFound":5,
>> "startOffset":0,
>> "endOffset":5,
>> "suggestion":["
> *
>> apple
> *
>> ",
>>   "
> *
>> apple
> *
>>  and",
>>   "
> *
>> apple
> *
>>  and facebook",
>>   "
> *
>> apple
> *
>>  and facebook learn",
>>   "
> *
>> apple
> *
>>  and facebook learn from"]},
>>   "collation","
> *
>> apple
> *
>> "]}}
>>
>>
>>
>> *Rajesh**.*
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204222.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread O. Olson
Thank you Rajesh. I think I got a bit of help from the answer at:
http://stackoverflow.com/a/29743945

While that example sort of worked for me, I'm not had the time to test what
works and what didn't. 

So far I have found that I need the the field in my searchComponent to be of
type 'string'. In my original example I had this as text_general. Next I
used the suggest_string fieldType as defined in the StackOverflow answer. I
also removed your queryConverter, and it still works, so I think it's not
needed. 

Thank you very much,
O. O. 



Rajesh Hazari wrote
> I just tested your config with my schema and it worked.
> 
> my config :
>   
> 
> 
> 
>   
> 
> suggest
> 
>   
> 
> org.apache.solr.spelling.suggest.Suggester
> 
>   
>  name="lookupImpl">
> org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
> 
>   
> 
> textSuggest
> 
>   
> 
> 0.005
> 
>   
> 
> true
> 
>   
> 
> text_general
> 
>   
> 
> true
> 
> 
> 
>   
> 
> 
>  class="org.apache.solr.spelling.SuggestQueryConverter"/>
>   
>  name="/suggest1">
> 
> 
>   
> 
> true
> 
>   
> 
> suggest
> 
>   
> 
> true
> 
>   
> 
> 5
> 
>   
> 
> true
> 
> 
> 
> 
> 
>   
> 
> suggest1
> 
> 
> 
>   
> 
> 
> http://localhost:8585/solr/collection1/suggest1?q=apple&rows=10&wt=json&indent=true
> 
> {
>   "responseHeader":{
> "status":0,
> "QTime":2},
>   "spellcheck":{
> "suggestions":[
>   "apple",{
> "numFound":5,
> "startOffset":0,
> "endOffset":5,
> "suggestion":["
*
> apple
*
> ",
>   "
*
> apple
*
>  and",
>   "
*
> apple
*
>  and facebook",
>   "
*
> apple
*
>  and facebook learn",
>   "
*
> apple
*
>  and facebook learn from"]},
>   "collation","
*
> apple
*
> "]}}
> 
> 
> 
> *Rajesh**.*





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204222.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread Rajesh Hazari
yes "textSuggest" is of type "text_general" with below definition


 





  
  

 




  


*Rajesh.*

On Wed, May 6, 2015 at 4:50 PM, O. Olson  wrote:

> Thank you Rajesh for responding so quickly. I tried it again with a restart
> and a reimport and I still cannot get this to work i.e. I'm seeing no
> difference.
>
> I'm wondering how you define: 'textSuggest' in your schema? In my case I
> use
> the field 'text' that is defined as:
>
>  multiValued="true"/>
>
> I'm wondering if your 'textSuggest' is of type "text_general" ?
>
> Thank you again for your help
> O. O.
>
>
> Rajesh Hazari wrote
> > I just tested your config with my schema and it worked.
> >
> > my config :
> >
> > 
> >
> > 
> >
> > 
> > suggest
> > 
> >
> > 
> > org.apache.solr.spelling.suggest.Suggester
> > 
> >
> >  > name="lookupImpl">
> > org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
> > 
> >
> > 
> > textSuggest
> > 
> >
> > 
> > 0.005
> > 
> >
> > 
> > true
> > 
> >
> > 
> > text_general
> > 
> >
> > 
> > true
> > 
> >
> > 
> >
> > 
> >
> >  > class="org.apache.solr.spelling.SuggestQueryConverter"/>
> >
> >  > name="/suggest1">
> >
> > 
> >
> > 
> > true
> > 
> >
> > 
> > suggest
> > 
> >
> > 
> > true
> > 
> >
> > 
> > 5
> > 
> >
> > 
> > true
> > 
> >
> > 
> >
> > 
> >
> > 
> > suggest1
> > 
> >
> > 
> >
> > 
> >
> >
> http://localhost:8585/solr/collection1/suggest1?q=apple&rows=10&wt=json&indent=true
> >
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":2},
> >   "spellcheck":{
> > "suggestions":[
> >   "apple",{
> > "numFound":5,
> > "startOffset":0,
> > "endOffset":5,
> > "suggestion":["
> *
> > apple
> *
> > ",
> >   "
> *
> > apple
> *
> >  and",
> >   "
> *
> > apple
> *
> >  and facebook",
> >   "
> *
> > apple
> *
> >  and facebook learn",
> >   "
> *
> > apple
> *
> >  and facebook learn from"]},
> >   "collation","
> *
> > apple
> *
> > "]}}
> >
> >
> >
> > *Rajesh**.*
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204208.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Completion Suggester in Solr

2015-05-06 Thread Pradeep Bhattiprolu
Hi

Is there a equivalent of Completion suggester of ElasticSearch in Solr ?

I am a user who uses both Solr and ES, in different projects.

I am not able to find a solution in Solr, where i can use :

1) FSA Structure
2) multiple terms as synonyms
3) assign a weight to each document based on certain hueristics, ex:
popularity score, user search history etc.


Any kind of help , pointers to relevant examples and documentation is
highly appreciated.

thanks in advance.

Pradeep


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread O. Olson
Thank you Rajesh for responding so quickly. I tried it again with a restart
and a reimport and I still cannot get this to work i.e. I'm seeing no
difference. 

I'm wondering how you define: 'textSuggest' in your schema? In my case I use
the field 'text' that is defined as: 



I'm wondering if your 'textSuggest' is of type "text_general" ?

Thank you again for your help
O. O.


Rajesh Hazari wrote
> I just tested your config with my schema and it worked.
> 
> my config :
>   
> 
> 
> 
>   
> 
> suggest
> 
>   
> 
> org.apache.solr.spelling.suggest.Suggester
> 
>   
>  name="lookupImpl">
> org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
> 
>   
> 
> textSuggest
> 
>   
> 
> 0.005
> 
>   
> 
> true
> 
>   
> 
> text_general
> 
>   
> 
> true
> 
> 
> 
>   
> 
> 
>  class="org.apache.solr.spelling.SuggestQueryConverter"/>
>   
>  name="/suggest1">
> 
> 
>   
> 
> true
> 
>   
> 
> suggest
> 
>   
> 
> true
> 
>   
> 
> 5
> 
>   
> 
> true
> 
> 
> 
> 
> 
>   
> 
> suggest1
> 
> 
> 
>   
> 
> 
> http://localhost:8585/solr/collection1/suggest1?q=apple&rows=10&wt=json&indent=true
> 
> {
>   "responseHeader":{
> "status":0,
> "QTime":2},
>   "spellcheck":{
> "suggestions":[
>   "apple",{
> "numFound":5,
> "startOffset":0,
> "endOffset":5,
> "suggestion":["
*
> apple
*
> ",
>   "
*
> apple
*
>  and",
>   "
*
> apple
*
>  and facebook",
>   "
*
> apple
*
>  and facebook learn",
>   "
*
> apple
*
>  and facebook learn from"]},
>   "collation","
*
> apple
*
> "]}}
> 
> 
> 
> *Rajesh**.*





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204208.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr 3.6.2 under tomcat 8 missing corename in path

2015-05-06 Thread Tim Dunphy
I'm trying to setup an old version of Solr for one of our drupal
developers. Apparently only versions 1.x or 3.x will work with the current
version of drupal.

I'm setting up solr 3.4.2 under tomcat.

And I'm getting this error when I start tomcat and surf to the /solr/admin
URL:

 HTTP Status 404 - missing core name in path

type Status report

message missing core name in path

description The requested resource is not available.

I have solr living in /opt:

# ls -ld /opt/solr
lrwxrwxrwx. 1 root root 17 May  6 12:48 /opt/solr -> apache-solr-3.6.2

And I have my cores located here:

# ls -ld /opt/solr/admin/cores
drwxr-xr-x. 3 root root 4096 May  6 14:37 /opt/solr/admin/cores

Just one core so far, until I can get this working.

# ls -l /opt/solr/admin/cores/
total 4
drwxr-xr-x. 5 root root 4096 May  6 14:08 collection1

I have this as my solr.xml file:


 
   


Which is located in these two places:

# ls -l /opt/solr/solr.xml /usr/local/tomcat/conf/Catalina/solr.xml
-rw-r--r--. 1 root root 169 May  6 14:38 /opt/solr/solr.xml
-rw-r--r--. 1 root root 169 May  6 14:38
/usr/local/tomcat/conf/Catalina/solr.xml

These are the contents of my /opt/solr directory

# ls -l  /opt/solr/
total 436
drwxr-xr-x.  3 root root   4096 May  6 14:37 admin
-rw-r--r--.  1 root root 176647 Dec 18  2012 CHANGES.txt
drwxr-xr-x.  3 root root   4096 May  6 12:48 client
drwxr-xr-x.  9 root root   4096 Dec 18  2012 contrib
drwxr-xr-x.  3 root root   4096 May  6 12:48 dist
drwxr-xr-x.  3 root root   4096 May  6 12:48 docs
-rw-r--r--.  1 root root   1274 May  6 13:28 elevate.xml
drwxr-xr-x. 11 root root   4096 May  6 12:48 example
-rw-r--r--.  1 root root  81331 Dec 18  2012 LICENSE.txt
-rw-r--r--.  1 root root  20828 Dec 18  2012 NOTICE.txt
-rw-r--r--.  1 root root   5270 Dec 18  2012 README.txt
-rw-r--r--.  1 root root  55644 May  6 13:27 schema.xml
-rw-r--r--.  1 root root  60884 May  6 13:27 solrconfig.xml
-rw-r--r--.  1 root root169 May  6 14:38 solr.xml


Yet, when I bounce tomcat, this is the result that I get:

HTTP Status 404 - missing core name in path

type Status report

message missing core name in path

description The requested resource is not available.

Cany anyone tell me what I'm doing wrong?


Thanks!!
Tim


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread Rajesh Hazari
I just tested your config with my schema and it worked.

my config :
  

  suggest
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
  textSuggest
  0.005
  true
  text_general
  true

  



  

  true
  suggest
  true
  5
  true


  suggest1

  


http://localhost:8585/solr/collection1/suggest1?q=apple&rows=10&wt=json&indent=true

{
  "responseHeader":{
"status":0,
"QTime":2},
  "spellcheck":{
"suggestions":[
  "apple",{
"numFound":5,
"startOffset":0,
"endOffset":5,
"suggestion":["apple",
  "apple and",
  "apple and facebook",
  "apple and facebook learn",
  "apple and facebook learn from"]},
  "collation","apple"]}}



*Rajesh**.*

On Wed, May 6, 2015 at 2:48 PM, Rajesh Hazari 
wrote:

> Just add the queryConverter definition in your solr config you should use
> see multiple term suggestions.
> and also make sure you have shingleFilterFactory as one of the filter in
> you schema field definitions for your field "text_general".
>
>  outputUnigrams="true"/>
>
>
> *Rajesh**.*
>
> On Wed, May 6, 2015 at 1:47 PM, O. Olson  wrote:
>
>> Thank you Rajesh. I'm not familiar with the queryConverter. How do you
>> wire
>> it up to the rest of the setup? Right now, I just put it between the
>> SpellCheckComponent and the RequestHandler i.e. my config is as:
>>
>> 
>> 
>>   suggest
>>   > name="classname">org.apache.solr.spelling.suggest.Suggester
>>   >
>> name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
>>   text
>>   0.005
>>   true
>>   text_general
>>   true
>> 
>>   
>>
>>   > class="org.apache.solr.spelling.SuggestQueryConverter"/>
>>
>>   > name="/suggest">
>> 
>>   true
>>   suggest
>>   true
>>   5
>>   true
>> 
>> 
>>   suggest
>> 
>>   
>>
>> Is this correct? I do not see any difference in my results i.e.
>> the
>> suggestions are the same as before.
>> O. O.
>>
>>
>>
>>
>>
>> Rajesh Hazari wrote
>> > make sure you have this query converter defined in your config
>> > > > class="org.apache.solr.spelling.SuggestQueryConverter"/>
>> > *Thanks,*
>> > *Rajesh**.*
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204173.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread Rajesh Hazari
Just add the queryConverter definition in your solr config you should use
see multiple term suggestions.
and also make sure you have shingleFilterFactory as one of the filter in
you schema field definitions for your field "text_general".




*Rajesh**.*

On Wed, May 6, 2015 at 1:47 PM, O. Olson  wrote:

> Thank you Rajesh. I'm not familiar with the queryConverter. How do you wire
> it up to the rest of the setup? Right now, I just put it between the
> SpellCheckComponent and the RequestHandler i.e. my config is as:
>
> 
> 
>   suggest
>name="classname">org.apache.solr.spelling.suggest.Suggester
>   
> name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
>   text
>   0.005
>   true
>   text_general
>   true
> 
>   
>
>class="org.apache.solr.spelling.SuggestQueryConverter"/>
>
>name="/suggest">
> 
>   true
>   suggest
>   true
>   5
>   true
> 
> 
>   suggest
> 
>   
>
> Is this correct? I do not see any difference in my results i.e. the
> suggestions are the same as before.
> O. O.
>
>
>
>
>
> Rajesh Hazari wrote
> > make sure you have this query converter defined in your config
> >  > class="org.apache.solr.spelling.SuggestQueryConverter"/>
> > *Thanks,*
> > *Rajesh**.*
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204173.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread O. Olson
Thank you Rajesh. I'm not familiar with the queryConverter. How do you wire
it up to the rest of the setup? Right now, I just put it between the
SpellCheckComponent and the RequestHandler i.e. my config is as: 



  suggest
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
  text  
  0.005
  true
  text_general
  true

  
  
   
  
  

  true
  suggest
  true
  5
  true


  suggest

  

Is this correct? I do not see any difference in my results i.e. the
suggestions are the same as before.
O. O.





Rajesh Hazari wrote
> make sure you have this query converter defined in your config
>  class="org.apache.solr.spelling.SuggestQueryConverter"/>
> *Thanks,*
> *Rajesh**.*





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204173.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread Rajesh Hazari
make sure you have this query converter defined in your config



*Thanks,*
*Rajesh**.*

On Wed, May 6, 2015 at 12:39 PM, O. Olson  wrote:

> I'm trying to get the AnalyzingInfixSuggester to work but I'm not
> successful.
> I'd be grateful if someone can point me to a working example.
>
> Problem:
> My content is product descriptions similar to a BestBuy or NewEgg catalog.
> My problem is that I'm getting only single words in the suggester results.
> E.g. if I type 'len', I get the suggester results like 'Lenovo' but not
> 'Lenovo laptop' or something larger/longer than a single word.
>
> There is a suggestion here:
>
> http://blog.mikemccandless.com/2013/06/a-new-lucene-suggester-based-on-infix.html
> that the search at:
> http://jirasearch.mikemccandless.com/search.py?index=jira is powered by
> the
> AnalyzingInfixSuggester  If this is true, when I use this suggester, I get
> more than a few words in the suggester results, but I don't with my setup
> i.e. on my setup I get only single words. My configuration is
>
>
> 
> 
>   suggest
>name="classname">org.apache.solr.spelling.suggest.Suggester
>   
> name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
>   text
>   0.005
>   true
>   text_general
>   true
> 
>   
>
>name="/suggest">
> 
>   true
>   suggest
>   true
>   5
>   true
> 
> 
>   suggest
> 
>   
>
> I copy the contents of all of my fields to a single field called 'text'.
> The
> ' text_general' type is exactly as in the solr examples:
>
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/example-DIH/solr/db/conf/schema.xml?view=markup
>
> I'd be grateful if anyone can help me. I don't know what to look at. Thank
> you in adance.
>
> O. O.
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread O. Olson
I'm trying to get the AnalyzingInfixSuggester to work but I'm not successful.
I'd be grateful if someone can point me to a working example. 

Problem:
My content is product descriptions similar to a BestBuy or NewEgg catalog.
My problem is that I'm getting only single words in the suggester results.
E.g. if I type 'len', I get the suggester results like 'Lenovo' but not
'Lenovo laptop' or something larger/longer than a single word. 

There is a suggestion here:
http://blog.mikemccandless.com/2013/06/a-new-lucene-suggester-based-on-infix.html
that the search at:
http://jirasearch.mikemccandless.com/search.py?index=jira is powered by the
AnalyzingInfixSuggester  If this is true, when I use this suggester, I get
more than a few words in the suggester results, but I don't with my setup
i.e. on my setup I get only single words. My configuration is 




  suggest
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
  text  
  0.005
  true
  text_general
  true

  
  
  

  true
  suggest
  true
  5
  true


  suggest

  

I copy the contents of all of my fields to a single field called 'text'. The
' text_general' type is exactly as in the solr examples:
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/example-DIH/solr/db/conf/schema.xml?view=markup
 

I'd be grateful if anyone can help me. I don't know what to look at. Thank
you in adance.

O. O.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread Shawn Heisey
On 5/6/2015 8:55 AM, adfel70 wrote:
> Thank you for the detailed answer.
> How can I decrease the impact of opening a searcher in such a large index?
> especially the impact of heap usage that causes OOM.

See the wiki link I sent.  It talks about some of the things that
require a lot of heap and ways you can reduce those requirements.  The
lists are nowhere near complete.

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

> regarding GC tuning - I am doint that.
> here are the params I use:
> AggresiveOpts
> UseLargePages
> ParallelRefProcEnabled
> CMSParallelRemarkEnabled
> CMSMaxAbortablePrecleanTime=6000
> CMDTriggerPermRatio=80
> CMSInitiatingOccupancyFraction=70
> UseCMSInitiatinOccupancyOnly
> CMSFullGCsBeforeCompaction=1
> PretenureSizeThreshold=64m
> CMSScavengeBeforeRemark
> UseConcMarkSweepGC
> MaxTenuringThreshold=8
> TargetSurvivorRatio=90
> SurviorRatio=4
> NewRatio=2
> Xms16gb
> Xmn28gb

This list seems to have come from re-typing the GC options.  If this is
a cut/paste, I would not expect it to work -- there are typos and part
of each option is missing other characters.  Assuming that this is not
cut/paste, it is mostly similar to the CMS options that I once used for
my own index:

http://wiki.apache.org/solr/ShawnHeisey#CMS_.28ConcurrentMarkSweep.29_Collector

> How many documents per shard are recommended?
> Note that I use nested documents. total collection size is 3 billion docs,
> number of parent docs is 600 million. the rest are children.

For the G1 collector, you'd want to limit each shard to about 100
million docs.  I have no idea about limitations and capabilities where
very large memory allocations are concerned with the CMS collector. 
Running the latest Java 8 is *strongly* recommended, no matter what
collector you're using, because recent versions have incorporated GC
improvements with large memory allocations.  With Java 8u40 and later,
the limitations for 16MB huge allocations on the G1 collector might not
even apply.

Thanks,
Shawn



Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-06 Thread Bruno Mannina

Yes thanks it's now for me too.

Daniel, my pn is always in uppercase and I index them always in uppercase.
the problem (solved now after all your answers, thanks) was the request, 
if users

requests with lowercase then solr reply no result and it was not good.

but now the problem is solved, I changed in my source file the name pn 
field to id

and in my schema I use a copy field named pn and it works perfectly.

Thanks a lot !!!

Le 06/05/2015 09:44, Daniel Collins a écrit :

Ah, I remember seeing this when we first started using Solr (which was 4.0
because we needed Solr Cloud), I never got around to filing an issue for it
(oops!), but we have a note in our schema to leave the key field a normal
string (like Bruno we had tried to lowercase it which failed).
We didn't really know Solr in those days, and hadn't really thought about
it since then, but Hoss' and Erick's explanations make perfect sense now!

Since shard routing is (basically) done on hashes of the unique key, if I
have 2 documents which are the "same", but have values "HELLO" and "hello",
they might well hash to completely different shards, so the update
logistics would be horrible.

Bruno, why do you need to lowercase at all then?  You said in your example,
that your client application always supplies "pn" and it is always
uppercase, so presumably all adds/updates could be done directly on that
field (as a normal string with no lowercasing).  Where does the case
insensitivity come in, is that only for searching?  If so couldn't you add
a search field (called id), and update your app to search using that (or
make that your default search field, I guess it depends if your calling app
explicitly uses the pn field name in its searches).


On 6 May 2015 at 01:55, Erick Erickson  wrote:


Well, "working fine" may be a bit of an overstatement. That has never
been officially supported, so it "just happened" to work in 3.6.

As Chris points out, if you're using SolrCloud then this will _not_
work as routing happens early in the process, i.e. before the analysis
chain gets the token so various copies of the doc will exist on
different shards.

Best,
Erick

On Mon, May 4, 2015 at 4:19 PM, Bruno Mannina  wrote:

Hello Chris,

yes I confirm on my SOLR3.6 it works fine since several years, and each

doc

added with same code is updated not added.

To be more clear, I receive docs with a field name "pn" and it's the
uniqueKey, and it always in uppercase

so I must define in my schema.xml

 
 
indexed="true"

stored="false"/>
...
id
...
   

but the application that use solr already exists so it requests with pn
field not id, i cannot change that.
and in each docs I receive, there is not id field, just pn field, and  i
cannot also change that.

so there is a problem no ? I must import a id field and request a pn

field,

but I have a pn field only for import...



Le 05/05/2015 01:00, Chris Hostetter a écrit :

: On SOLR3.6, I defined a string_ci field like this:
:
: 
: 
:   
:   
: 
: 
:
: 


I'm really suprised that field would have worked for you (reliably) as a
uniqueKey field even in Solr 3.6.

the best practice for something like what you describe has always (going
back to Solr 1.x) been to use a copyField to create a case insensitive
copy of your uniqueKey for searching.

if, for some reason, you really want case insensitve *updates* (so a doc
with id "foo" overwrites a doc with id "FOO" then the only reliable way

to

make something like that work is to do the lowercassing in an
UpdateProcessor to ensure it happens *before* the docs are distributed

to

the correct shard, and so the correct existing doc is overwritten (even

if

you aren't using solr cloud)



-Hoss
http://www.lucidworks.com/




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: ZooKeeperException: Could not find configName for collection

2015-05-06 Thread Erick Erickson
Have you looked arond at your directories on disk? I'm _not_ talking
about the admin UI here. The default is "core discovery" mode, which
recursively looks under solr_home and thinks there's a core wherever
it finds a "core.properties" file. If you find such a thing, rename it
or remove the directory.

Another alternative would be to push a configset named "new_core" up
to Zookeeper, that might allow you to see (and then delete) the
collection new_core belongs to.

It looks like you tried to use the admin UI to create a core and it's
all local or something like that.

Best,
Erick

On Wed, May 6, 2015 at 4:00 AM, shacky  wrote:
> Hi list.
>
> I created a new collection on my new SolrCloud installation, the new
> collection is shown and replicated on all three nodes, but on the
> first node (only on this one) I get this error:
>
> new_core: 
> org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
> Could not find configName for collection new_core found:null
>
> I cannot see any core named "new_core" on that node, and I also tried
> to remove it:
>
> root@index1:/opt/solr# ./bin/solr delete -c new_core
> Connecting to ZooKeeper at zk1,zk2,zk3
> ERROR: Collection new_core not found!
>
> Could you help me, please?
>
> Thank you very much!
> Bye


Re: New core on Solr Cloud

2015-05-06 Thread Erick Erickson
That should have put one replica on each machine, if it did you're fine.

Best,
Erick

On Wed, May 6, 2015 at 3:58 AM, shacky  wrote:
> Ok, I found out that the creation of new core/collection on Solr 5.1
> is made with the bin/solr script.
> So I created a new collection with this command:
>
> ./solr create_collection -c test -replicationFactor 3
>
> Is this the correct way?
>
> Thank you very much,
> Bye!
>
> 2015-05-06 10:02 GMT+02:00 shacky :
>> Hi.
>> This is my first experience with Solr Cloud.
>> I installed three Solr nodes with three ZooKeeper instances and they
>> seemed to start well.
>> Now I have to create a new replicated core and I'm trying to found out
>> how I can do it.
>> I found many examples about how to create shards and cores, but I have
>> to create one core with only one shard replicated on all three nodes
>> (so basically I want to have the same data on all three nodes).
>>
>> Could you help me to understand what is the correct way to make this, please?
>>
>> Thank you very much!
>> Bye


Solr port went down on remote server

2015-05-06 Thread Nitin Solanki
Hi,
   I have installed Solr on remote server and started on port 8983.
Now, I have bind my local machine port 8983 with remote server 8983 of Solr
using *ssh* (Ubuntu OS). When I am requesting on Solr for getting the
suggestions on remote server through local machine calls. Sometimes it
gives response, sometimes doesn't.

I am not able to detect the problem that why is it so?
Is it remote server binding issue?  OR Solr went down ?
I am not getting the problem.

To detect the problem, I ran a crontab job using telnet command to check
existence of port (8983) of Solr. It is working fine without throwing any
connection refused error. I am able to detect the problem. Any help please..


Re: What is the best practice to Backup and delete a core from SOLR Master-Slave architecture

2015-05-06 Thread Erick Erickson
Well, they're just files on disk. You can freely copy the index files
around wherever you want. I'd do a few practice runs first though. So:
1> unload the core (or otherwise shut it down).
2> copy the data directory and all sub directories.
3> I'd also copy the conf directory to insure a consistent picture of
the index when you restored it.
4> delete the core however you please.

Of course before I did <4> I'd try bringing up the core on some other
machine a few times, just to be sure you had all the necessary
parts... Once you were confident of the process you don't need to
restore _every_ time.

Best,
Erick

On Wed, May 6, 2015 at 3:08 AM, sangeetha.subraman...@gtnexus.com
 wrote:
> Hi,
>
> I am a newbie to SOLR. I have setup Master Slave configuration with SOLR 4.0. 
> I am trying to identify what is the best way to backup an old core and delete 
> the same so as to free up space from the disk.
>
> I did get the information on how to unload a core and delete the indexes from 
> the core.
>
> Unloading - http://localhost:8983/solr/admin/cores?action=UNLOAD&core=core0
> Delete Indexes - 
> http://localhost:8983/solr/admin/cores?action=UNLOAD&core=core0&deleteIndex=true
>
> What is the best approach to remove the old core ?
>
>
> *   Approach 1
>
> o   Unload the core in both Master and Slave server AND delete the index only 
> from Master server (retain the indexes in slave server as a backup). If I am 
> retaining the indexes in Slave server, at later point is there a way to bring 
> those to Master Server ?
>
> *   Approach 2
>
> o   Unload and delete the indexes from both Master and Slave server. Before 
> deleting, take a backup of the data dir of old core from File system. I am 
> not sure if this is even possible ?
>
> Is there any other way better way of doing this ? Please let me know
>
> Thanks
> Sangeetha


Re: Solr cloud clusterstate.json update query ?

2015-05-06 Thread Erick Erickson
Gopal:

Did you see my previous answer?

Best,
Erick

On Tue, May 5, 2015 at 9:42 PM, Gopal Jee  wrote:
> about <2> , live_nodes under zookeeper is ephemeral node (please see
> zookeeper ephemeral node). So, once connection from solr zkClient to
> zookeeper is lost, these nodes will disappear automatically. AFAIK,
> clusterstate.json is updated by overseer based on messages published to a
> queue in zookeeper by solr zkclients. In case, solr node dies ungracefully,
> I am not sure how this event is updated in clusterstate.json.
> *Can someone shed some light *on ungraceful solr shutdown and consequent
> status update in clusterstate. I guess there would be some ay, because all
> nodes in a cluster decides clusterstate based on watched clusterstate.json
> node. They will not be watching live_nodes for updating their state.
>
> Gopal
>
> On Wed, May 6, 2015 at 6:33 AM, Erick Erickson 
> wrote:
>
>> about <1>. This shouldn't be happening, so I wouldn't concentrate
>> there first. The most common reason is that you have a short Zookeeper
>> timeout and the replicas go into a stop-the-world garbage collection
>> that exceeds the timeout. So the first thing to do is to see if that's
>> happening. Here are a couple of good places to start:
>>
>> http://lucidworks.com/blog/garbage-collection-bootcamp-1-0/
>> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr
>>
>> <2> Partial answer is that ZK does a keep-alive type thing and if the
>> solr nodes it knows about don't reply, it marks the nodes as down.
>>
>> Best,
>> Erick
>>
>> On Tue, May 5, 2015 at 5:42 AM, Sai Sreenivas K  wrote:
>> > Could you clarify on the following questions,
>> > 1. Is there a way to avoid all the nodes simultaneously getting into
>> > recovery state when a bulk indexing happens ? Is there an api to disable
>> > replication on one node for a while ?
>> >
>> > 2. We recently changed the host name on nodes in solr.xml. But the old
>> host
>> > entries still exist in the clusterstate.json marked as active state.
>> Though
>> > live_nodes has the correct information. Who updates clusterstate.json if
>> > the node goes down in an ungraceful fashion without notifying its down
>> > state ?
>> >
>> > Thanks,
>> > Sai Sreenivas K
>>
>
>
>
> --


Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread adfel70
Thank you for the detailed answer.
How can I decrease the impact of opening a searcher in such a large index?
especially the impact of heap usage that causes OOM.

regarding GC tuning - I am doint that.
here are the params I use:
AggresiveOpts
UseLargePages
ParallelRefProcEnabled
CMSParallelRemarkEnabled
CMSMaxAbortablePrecleanTime=6000
CMDTriggerPermRatio=80
CMSInitiatingOccupancyFraction=70
UseCMSInitiatinOccupancyOnly
CMSFullGCsBeforeCompaction=1
PretenureSizeThreshold=64m
CMSScavengeBeforeRemark
UseConcMarkSweepGC
MaxTenuringThreshold=8
TargetSurvivorRatio=90
SurviorRatio=4
NewRatio=2
Xms16gb
Xmn28gb

any input on this?

How many documents per shard are recommended?
Note that I use nested documents. total collection size is 3 billion docs,
number of parent docs is 600 million. the rest are children.



Shawn Heisey-2 wrote
> On 5/6/2015 1:58 AM, adfel70 wrote:
>> I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
>> documents.
>> it currently has 3 billion documents overall (parent and children).
>> each shard has around 200 million docs. size of each shard is 250GB.
>> this runs on 12 machines. each machine has 4 SSD disks and 4 solr
>> processes.
>> each process has 28GB heap.  each machine has 196GB RAM.
>> 
>> I perform periodic indexing throughout the day. each indexing cycle adds
>> around 1.5 million docs. I keep the indexing load light - 2 processes
>> with
>> bulks of 20 docs.
>> 
>> My use case demands that each indexing cycle will be visible only when
>> the
>> whole cycle finishes.
>> 
>> I tried various methods of using soft and hard commits:
> 
> I personally would configure autoCommit on a five minute (maxTime of
> 30) interval with openSearcher=false.  The use case you have
> outlined (not seeing changed until the indexing is done) demands that
> you do NOT turn on autoSoftCommit, that you do one manual commit at the
> end of indexing, which could be either a soft commit or a hard commit.
> I would recommend a soft commit.
> 
> Because it is the openSearcher part of a commit that's very expensive,
> you can successfully do autoCommit with openSearcher=false on an
> interval like 10 or 15 seconds and not see much in the way of immediate
> performance loss.  That commit is still not free, not only in terms of
> resources, but in terms of java heap garbage generated.
> 
> The general advice with commits is to do them as infrequently as you
> can, which applies to ANY commit, not just those that make changes
> visible.
> 
>> with all methods I encounter pretty much the same problem:
>> 1. heavy GCs when soft commit is performed (methods 1,2) or when
>> hardcommit
>> opensearcher=true is performed. these GCs cause heavy latency (average
>> latency is 3 secs. latency during the problem is 80secs)
>> 2. if indexing cycles come too often, which causes softcommits or
>> hardcommits(opensearcher=true) occur with a small interval one after
>> another
>> (around 5-10minutes), I start getting many OOM exceptions.
> 
> If you're getting OOM, then either you need to change things so Solr
> requires less heap memory, or you need to increase the heap size.
> Changing things might be either the config or how you use Solr.
> 
> Are you tuning your garbage collection?  With a 28GB heap, tuning is not
> optional.  It's so important that the startup scripts in 5.0 and 5.1
> include it, even though the default max heap is 512MB.
> 
> Let's do some quick math on your memory.  You have four instances of
> Solr on each machine, each with a 28GB heap.  That's 112GB of memory
> allocated to Java.  With 196GB total, you have approximately 84GB of RAM
> left over for caching your index.
> 
> A 16-shard index with three replicas means 48 cores.  Divide that by 12
> machines and that's 4 replicas on each server, presumably one in each
> Solr instance.  You say that the size of each shard is 250GB, so you've
> got about a terabyte of index on each server, but only 84GB of RAM for
> caching.
> 
> Even with SSD, that's not going to be anywhere near enough cache memory
> for good Solr performance.
> 
> All these memory issues, including GC tuning, are discussed on this wiki
> page:
> 
> http://wiki.apache.org/solr/SolrPerformanceProblems
> 
> One additional note: By my calculations, each filterCache entry will be
> at least 23MB in size.  This means that if you are using the filterCache
> and the G1 collector, you will not be able to avoid humongous
> allocations, which is any allocation larger than half the G1 region
> size.  The max configurable G1 region size is 32MB.  You should use the
> CMS collector for your GC tuning, not G1.  If you can reduce the number
> of documents in each shard, G1 might work well.
> 
> Thanks,
> Shawn





--
View this message in context: 
http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068p4204148.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple index.timestamp directories using up disk space

2015-05-06 Thread rishi
We use the following merge policy on SSD's and are running on physical
machines with linux OS.

 10


3
15

64 

Not sure if its very aggressive, but its something we keep to prevent
deleted documents taking up too much space on our index.

Is there some error message that solr logs when rename and deletion of the
directories fails. If so we could monitor our logs to get a better idea for
the root cause. At present we can only react when things go wrong based on
disk space alarms.

Thanks,
Rishi.
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-index-timestamp-directories-using-up-disk-space-tp4201098p4204145.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr with logstash solr_http output plugin and geoip filter

2015-05-06 Thread Daniel Marsh
Hi,

I'm currently using solr to index a moderate amount of information with
the help of logstash and the solr_http contrib output plugin.

solr is receiving documents, I've got banana as a web interface and I am
running it with a schemaless core.

I'm feeding documents via the contrib plugin solr_http and logstash. One
of the filters I'm using is geoip with the following setup:

  geoip {
source => "subject_ip"
database => "/opt/logstash/vendor/geoip/GeoLiteCity.dat"
target => "geoip"
fields => ["latitude", "longitude"]
  }

However this created a "string" field called geoip with the value:
{"latitude"=>2.0, "longitude"=>13.0, "location"=>[2.0, 13.0]}

This is "meant" to become three "sub" fields:
geoip.latitude => 2.0
geoip.longitude => 13.0
geoip.location => 2.0, 13.0

The above setup worked with logstash feeding into elasticsearch,
resulting in geoip.location being populated correctly as a field itself.

Given it did work with ES, I assume the first issue is, solr either does
not know how to parse a value as additional variables with values, OR I
simply have not configured solr correctly (I'm betting on the latter).

I have only been using solr for about 8 hours (installed today), had to
try something as no amount of tweaking would resolve the indexing
performance issues I had with ES - I'm now indexing the same amount of
data into solr at near real-time on the exact same machine that was
running ES where indexing would stop after about 2 hours.

The whole point of the geoip field is to get geoip.location which will
be the location field used by bettermap on the banana web interface.

I am not running SiLK.
I am running solr 5.1, logstash 1.4.

Regards,
Daniel


getting frequent CorruptIndexException and inconsistent data though core is active

2015-05-06 Thread adfel70
Hi
I'm getting org.apache.lucene.index.CorruptIndexException 
liveDocs.count()=2000699 info.docCount()=2047904 info.getDelCount()=47207
(filename=_ney_1g.del).

This just happened for the 4th time in 2 weeks.
each time this happens in another core, usually when a replica tries to
recover, then it reports that it succeeded, and then the
CorruptIndexException  is thrown while trying to open searcher.

this core is marked as active and thus query can get redirected there and
this causes data inconsistency to users.
this occurs with solr 4.10.3, should be noted that I use nested docs.

ANOTHER problem is that replicas can get inconsistent number of docs with no
exception being reported.
This occurs usually when one of the replicas goes down during indexing. what
I end up getting is the leader being in an older version than the replicas
or having less docs than the replicas. switching leaders (stopping the
leader so that another replicas become the leader) does not fix the problem.

this occurs both in solr 4.10.3 and in solr 4.8





--
View this message in context: 
http://lucene.472066.n3.nabble.com/getting-frequent-CorruptIndexException-and-inconsistent-data-though-core-is-active-tp4204129.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread Shawn Heisey
On 5/6/2015 1:58 AM, adfel70 wrote:
> I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
> documents.
> it currently has 3 billion documents overall (parent and children).
> each shard has around 200 million docs. size of each shard is 250GB.
> this runs on 12 machines. each machine has 4 SSD disks and 4 solr processes.
> each process has 28GB heap.  each machine has 196GB RAM.
> 
> I perform periodic indexing throughout the day. each indexing cycle adds
> around 1.5 million docs. I keep the indexing load light - 2 processes with
> bulks of 20 docs.
> 
> My use case demands that each indexing cycle will be visible only when the
> whole cycle finishes.
> 
> I tried various methods of using soft and hard commits:

I personally would configure autoCommit on a five minute (maxTime of
30) interval with openSearcher=false.  The use case you have
outlined (not seeing changed until the indexing is done) demands that
you do NOT turn on autoSoftCommit, that you do one manual commit at the
end of indexing, which could be either a soft commit or a hard commit.
I would recommend a soft commit.

Because it is the openSearcher part of a commit that's very expensive,
you can successfully do autoCommit with openSearcher=false on an
interval like 10 or 15 seconds and not see much in the way of immediate
performance loss.  That commit is still not free, not only in terms of
resources, but in terms of java heap garbage generated.

The general advice with commits is to do them as infrequently as you
can, which applies to ANY commit, not just those that make changes visible.

> with all methods I encounter pretty much the same problem:
> 1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit
> opensearcher=true is performed. these GCs cause heavy latency (average
> latency is 3 secs. latency during the problem is 80secs)
> 2. if indexing cycles come too often, which causes softcommits or
> hardcommits(opensearcher=true) occur with a small interval one after another
> (around 5-10minutes), I start getting many OOM exceptions.

If you're getting OOM, then either you need to change things so Solr
requires less heap memory, or you need to increase the heap size.
Changing things might be either the config or how you use Solr.

Are you tuning your garbage collection?  With a 28GB heap, tuning is not
optional.  It's so important that the startup scripts in 5.0 and 5.1
include it, even though the default max heap is 512MB.

Let's do some quick math on your memory.  You have four instances of
Solr on each machine, each with a 28GB heap.  That's 112GB of memory
allocated to Java.  With 196GB total, you have approximately 84GB of RAM
left over for caching your index.

A 16-shard index with three replicas means 48 cores.  Divide that by 12
machines and that's 4 replicas on each server, presumably one in each
Solr instance.  You say that the size of each shard is 250GB, so you've
got about a terabyte of index on each server, but only 84GB of RAM for
caching.

Even with SSD, that's not going to be anywhere near enough cache memory
for good Solr performance.

All these memory issues, including GC tuning, are discussed on this wiki
page:

http://wiki.apache.org/solr/SolrPerformanceProblems

One additional note: By my calculations, each filterCache entry will be
at least 23MB in size.  This means that if you are using the filterCache
and the G1 collector, you will not be able to avoid humongous
allocations, which is any allocation larger than half the G1 region
size.  The max configurable G1 region size is 32MB.  You should use the
CMS collector for your GC tuning, not G1.  If you can reduce the number
of documents in each shard, G1 might work well.

Thanks,
Shawn



Re: Will field type change require complete re-index?

2015-05-06 Thread Shawn Heisey
On 5/6/2015 7:03 AM, Vishal Sharma wrote:
> Now, If I need to change the field type to date from String will this
> require complete reindex?

Yes, it absolutely will require a complete reindex.  A change like that
probably will result in errors on queries until a reindex is done.  You
may even need to completely delete the index directory and restart Solr
before doing your reindex to get rid of the old segments that have
information incompatible with your new schema.

http://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn



Re: Finding out optimal hash ranges for shard split

2015-05-06 Thread anand.mahajan
Yes - I'm using 2 level composite ids and that has caused the imbalance for
some shards.
Its cars data and the composite ids are of the form year-make!model-and
couple of other specifications. e.g. 2013Ford!Edge!123456 - but there are
just far too many Ford 2013 or 2011 cars that go and occupy the same shards.
This was done so as co-location of these docs is required as well for a few
of the search requirements - to avoid it hitting all shards all the time and
all queries do have the year and make combinations always specified and its
easier to work out the target shard for the query.

Regarding storing the hash against each document and then querying to find
out the optimal ranges - could it be done so that Solr maintains incremental
counters for each of the hash in the range for the shard - and then the
collections Splitshard API could use this internally to propose the optimal
shard ranges for the split? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204124.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread adfel70
I dont see any of these.
I've seen them before in other clusters and uses of SOLR  but don't see any
of these messages here.



Dmitry Kan-2 wrote
> Do you seen any (a lot?) of the warming searchers on deck, i.e. value for
> N:
> 
> PERFORMANCE WARNING: Overlapping onDeckSearchers=N
> 
> On Wed, May 6, 2015 at 10:58 AM, adfel70 <

> adfel70@

> > wrote:
> 
>> Hello
>> I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
>> documents.
>> it currently has 3 billion documents overall (parent and children).
>> each shard has around 200 million docs. size of each shard is 250GB.
>> this runs on 12 machines. each machine has 4 SSD disks and 4 solr
>> processes.
>> each process has 28GB heap.  each machine has 196GB RAM.
>>
>> I perform periodic indexing throughout the day. each indexing cycle adds
>> around 1.5 million docs. I keep the indexing load light - 2 processes
>> with
>> bulks of 20 docs.
>>
>> My use case demands that each indexing cycle will be visible only when
>> the
>> whole cycle finishes.
>>
>> I tried various methods of using soft and hard commits:
>>
>> 1. using auto hard commit with time=10secs (opensearcher=false) and an
>> explicit soft commit when the indexing finishes.
>> 2. using auto soft commit with time=10/30/60secs during the indexing.
>> 3. not using soft commit at all, just using auto hard commit with
>> time=10secs during the indexing (opensearcher=false) and an explicit hard
>> commit with opensearcher=true when the cycle finishes.
>>
>>
>> with all methods I encounter pretty much the same problem:
>> 1. heavy GCs when soft commit is performed (methods 1,2) or when
>> hardcommit
>> opensearcher=true is performed. these GCs cause heavy latency (average
>> latency is 3 secs. latency during the problem is 80secs)
>> 2. if indexing cycles come too often, which causes softcommits or
>> hardcommits(opensearcher=true) occur with a small interval one after
>> another
>> (around 5-10minutes), I start getting many OOM exceptions.
>>
>>
>> Thank you.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 
> 
> -- 
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info





--
View this message in context: 
http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068p4204123.html
Sent from the Solr - User mailing list archive at Nabble.com.


Will field type change require complete re-index?

2015-05-06 Thread Vishal Sharma
Hi,

I have been using Solr for sometime now and by mistake I used String for my
date fields. The format of the string is like this: "2015-05-05T13:24:10Z"


Now, If I need to change the field type to date from String will this
require complete reindex?



*Vishal Sharma**Team Leader, SFDC*T:+1 302 753 5105
E: vish...@grazitti.com
www.grazitti.com
April
13th-15th, 2015 *Meet us at*Moscone Center, Booth #32,
San Francisco Schedule a Meeting

  |[image:
Description: LinkedIn]
 [image: Description:
Twitter]  [image: fbook]



Re: Solr not getting Start. Error : Could not find the main class: org.apache.solr.util.SolrCLI

2015-05-06 Thread Shawn Heisey
On 5/6/2015 6:37 AM, Markus Heiden wrote:
> UnsupportedClassVersionError means you have an old JDK. Use a more recent
> one.

Specifically, "Unsupported major.minor version 51.0" means you are
trying to use Java 6 (1.6.0) to run a program that requires Java 7
(1.7.0).  Solr 4.8 and later (including the 5.x versions) requires Java 7.

If you're looking for the absolute minimum requirements, you only need
the JRE, not the JDK.

Thanks,
Shawn



Re: /suggest through SolrJ?

2015-05-06 Thread Alessandro Benedetti
Exactly Tomnaso ,
I was referring to that !

I wrote another mail in the dev mailing list, I will open a Jira Issue for
that !

Cheers

2015-04-29 12:16 GMT+01:00 Tommaso Teofili :

> 2015-04-27 19:22 GMT+02:00 Alessandro Benedetti <
> benedetti.ale...@gmail.com>
> :
>
> > Just had the very same problem, and I confirm that currently is quite a
> > mess to manage suggestions in SolrJ !
> > I have to go with manual Json parsing.
> >
>
> or very not nice NamedList API mess (see an example in JR Oak [1][2]).
>
> Regards,
> Tommaso
>
> p.s.:
> note that this applies to Solr 4.7.1 API, but reading the thread it seems
> the problem is still there.
>
> [1] :
>
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-solr-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/solr/query/SolrQueryIndex.java#L318
> [2] :
>
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-solr-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/solr/query/SolrQueryIndex.java#L370
>
>
>
> >
> > Cheers
> >
> > 2015-02-02 12:17 GMT+00:00 Jan Høydahl :
> >
> > > Using the /suggest handler wired to SuggestComponent, the
> > > SpellCheckResponse objects are not populated.
> > > Reason is that QueryResponse looks for a top-level element named
> > > "spellcheck"
> > >
> > >   else if ( "spellcheck".equals( n ) )  {
> > > _spellInfo = (NamedList) res.getVal( i );
> > > extractSpellCheckInfo( _spellInfo );
> > >   }
> > >
> > > Earlier the suggester was the same as the Spell component, but now with
> > > its own component, suggestions are put in "suggest".
> > >
> > > I think we're lacking a SuggestResponse.java for parsing suggest
> > > responses..??
> > >
> > > --
> > > Jan Høydahl, search solution architect
> > > Cominvent AS - www.cominvent.com
> > >
> > > > 26. sep. 2014 kl. 07.27 skrev Clemens Wyss DEV  >:
> > > >
> > > > Thx to you two.
> > > >
> > > > Just in case anybody else is trying to do "this". The following SolrJ
> > > code corresponds to the http request
> > > > GET http://localhost:8983/solr/solrpedia/suggest?q=atmo
> > > > of  "Solr in Action" (chapter 10):
> > > > ...
> > > > SolrServer server = new HttpSolrServer("
> > > http://localhost:8983/solr/solrpedia";);
> > > > SolrQuery query = new SolrQuery( "atmo" );
> > > > query.setRequestHandler( "/suggest" );
> > > > QueryResponse queryresponse = server.query( query );
> > > > ...
> > > > queryresponse.getSpellCheckResponse().getSuggestions();
> > > > ...
> > > >
> > > >
> > > > -Ursprüngliche Nachricht-
> > > > Von: Shawn Heisey [mailto:s...@elyograg.org]
> > > > Gesendet: Donnerstag, 25. September 2014 17:37
> > > > An: solr-user@lucene.apache.org
> > > > Betreff: Re: /suggest through SolrJ?
> > > >
> > > > On 9/25/2014 8:43 AM, Erick Erickson wrote:
> > > >> You can call anything from SolrJ that you can call from a URL.
> > > >> SolrJ has lots of convenience stuff to set particular parameters,
> > > >> parse the response, etc... But in the end it's communicating with
> Solr
> > > >> via a URL.
> > > >>
> > > >> Take a look at something like SolrQuery for instance. It has a nice
> > > >> command setFacetPrefix. Here's the entire method:
> > > >>
> > > >> public SolrQuery setFacetPrefix( String field, String prefix ) {
> > > >>this.set( FacetParams.FACET_PREFIX, prefix );
> > > >>return this;
> > > >> }
> > > >>
> > > >> which is really
> > > >>this.set( "facet.prefix", prefix ); All it's really doing is
> > > >> setting a SolrParams key/value pair which is equivalent to
> > > >> &facet.prefix=blahblah on a URL.
> > > >>
> > > >> As I remember, there's a "setPath" method that you can use to set
> the
> > > >> destination for the request to "suggest" (or maybe "/suggest"). It's
> > > >> something like that.
> > > >
> > > > Yes, like Erick says, just use SolrQuery for most accesses to Solr on
> > > arbitrary URL paths with arbitrary URL parameters.  The "set" method is
> > how
> > > you include those parameters.
> > > >
> > > > The SolrQuery method Erick was talking about at the end of his email
> is
> > > setRequestHandler(String), and you would set that to "/suggest".  Full
> > > disclosure about what this method actually does: it also sets the "qt"
> > > > parameter, but with the modern example Solr config, the qt parameter
> > > doesn't do anything -- you must actually change the URL path on the
> > > request, which this method will do if the value starts with a forward
> > slash.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > >
> > >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the 

Re: Solr not getting Start. Error : Could not find the main class: org.apache.solr.util.SolrCLI

2015-05-06 Thread Markus Heiden
UnsupportedClassVersionError means you have an old JDK. Use a more recent
one.

Markus

2015-05-06 12:59 GMT+02:00 Mayur Champaneria :

> Hello,
>
> When I starting solr-5.1.0 in Ubuntu 12.04 by,
>
> */bin/var/www/solr-5.0.0/bin ./solr start*
>
>
> Solr is being started and shows as below,
>
> *Started Solr server on port 8983 (pid=14457). Happy searching!*
>
>
> When I starting Solr on http://localhost:8983/solr/ its not starting.
> Then I have checking the status by
>
> */bin/var/www/solr-5.0.0/bin ./solr status*
>
>
> then at the end I have got an error as below,
>
>
> *Exception in thread "main" java.lang.UnsupportedClassVersionError:
> org/apache/solr/util/SolrCLI : Unsupported major.minor version 51.0
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
> at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>
> Could not find the main class: org.apache.solr.util.SolrCLI. Program will
> exit.
>
> please visit http://localhost:8983/solr*
>
>
> Same thing is repeating when starting solr on SolrCloud
>
> Please help me in this.
>
>
> --
> *Thanks & **Regards,*
>
>
> *Mayur Champaneria*
>
> *PHP Developer ( MMT )*
> *Vertex Softwares*
>


Re: Finding out optimal hash ranges for shard split

2015-05-06 Thread Shalin Shekhar Mangar
Hi Anand,

The nature of the hash function (murmur3) should lead to a approximately
uniform distribution of documents across sub-shards. Have you investigated
why, if at all, the sub-shards are not balanced? Do you use composite keys
e.g. abc!id1 which cause the imbalance?

I don't think there is a (cheap) way to implement what you are asking in
the current scheme of things because unless we go through each id and
calculate the hash, we have no way of knowing the optimal distribution.
However, if we were to store the hash of the key as a separate field in the
index then it should be possible to binary search for hash ranges which
lead to approx. equal distribution of docs in sub-shards. We can implement
something like that inside Solr.

On Wed, May 6, 2015 at 4:42 PM, anand.mahajan  wrote:

> Okay - Thanks for the confirmation Shalin.  Could this be a feature request
> in the Collections API - that we have a Split shard dry run API that
> accepts
> sub-shards count as a request param and returns the optimal shard ranges
> for
> the number of sub-shards requested to be created along with the respective
> document counts for each of the sub-shards? The users can then use this
> shard ranges for the actual split?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204100.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Finding out optimal hash ranges for shard split

2015-05-06 Thread anand.mahajan
Okay - Thanks for the confirmation Shalin.  Could this be a feature request
in the Collections API - that we have a Split shard dry run API that accepts
sub-shards count as a request param and returns the optimal shard ranges for
the number of sub-shards requested to be created along with the respective
document counts for each of the sub-shards? The users can then use this
shard ranges for the actual split?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204100.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr not getting Start. Error : Could not find the main class: org.apache.solr.util.SolrCLI

2015-05-06 Thread Mayur Champaneria
Hello,

When I starting solr-5.1.0 in Ubuntu 12.04 by,

*/bin/var/www/solr-5.0.0/bin ./solr start*


Solr is being started and shows as below,

*Started Solr server on port 8983 (pid=14457). Happy searching!*


When I starting Solr on http://localhost:8983/solr/ its not starting.
Then I have checking the status by

*/bin/var/www/solr-5.0.0/bin ./solr status*


then at the end I have got an error as below,


*Exception in thread "main" java.lang.UnsupportedClassVersionError:
org/apache/solr/util/SolrCLI : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)

Could not find the main class: org.apache.solr.util.SolrCLI. Program will exit.

please visit http://localhost:8983/solr*


Same thing is repeating when starting solr on SolrCloud

Please help me in this.


-- 
*Thanks & **Regards,*


*Mayur Champaneria*

*PHP Developer ( MMT )*
*Vertex Softwares*


Re: New core on Solr Cloud

2015-05-06 Thread shacky
Ok, I found out that the creation of new core/collection on Solr 5.1
is made with the bin/solr script.
So I created a new collection with this command:

./solr create_collection -c test -replicationFactor 3

Is this the correct way?

Thank you very much,
Bye!

2015-05-06 10:02 GMT+02:00 shacky :
> Hi.
> This is my first experience with Solr Cloud.
> I installed three Solr nodes with three ZooKeeper instances and they
> seemed to start well.
> Now I have to create a new replicated core and I'm trying to found out
> how I can do it.
> I found many examples about how to create shards and cores, but I have
> to create one core with only one shard replicated on all three nodes
> (so basically I want to have the same data on all three nodes).
>
> Could you help me to understand what is the correct way to make this, please?
>
> Thank you very much!
> Bye


ZooKeeperException: Could not find configName for collection

2015-05-06 Thread shacky
Hi list.

I created a new collection on my new SolrCloud installation, the new
collection is shown and replicated on all three nodes, but on the
first node (only on this one) I get this error:

new_core: 
org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
Could not find configName for collection new_core found:null

I cannot see any core named "new_core" on that node, and I also tried
to remove it:

root@index1:/opt/solr# ./bin/solr delete -c new_core
Connecting to ZooKeeper at zk1,zk2,zk3
ERROR: Collection new_core not found!

Could you help me, please?

Thank you very much!
Bye


What is the best practice to Backup and delete a core from SOLR Master-Slave architecture

2015-05-06 Thread sangeetha.subraman...@gtnexus.com
Hi,

I am a newbie to SOLR. I have setup Master Slave configuration with SOLR 4.0. I 
am trying to identify what is the best way to backup an old core and delete the 
same so as to free up space from the disk.

I did get the information on how to unload a core and delete the indexes from 
the core.

Unloading - http://localhost:8983/solr/admin/cores?action=UNLOAD&core=core0
Delete Indexes - 
http://localhost:8983/solr/admin/cores?action=UNLOAD&core=core0&deleteIndex=true

What is the best approach to remove the old core ?


*   Approach 1

o   Unload the core in both Master and Slave server AND delete the index only 
from Master server (retain the indexes in slave server as a backup). If I am 
retaining the indexes in Slave server, at later point is there a way to bring 
those to Master Server ?

*   Approach 2

o   Unload and delete the indexes from both Master and Slave server. Before 
deleting, take a backup of the data dir of old core from File system. I am not 
sure if this is even possible ?

Is there any other way better way of doing this ? Please let me know

Thanks
Sangeetha


Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread adfel70
1. yes, I'm sure that pauses are due to GCs. I monitor the cluster and
receive continuously metric from system and from java process.
I see clearly that when soft commit is triggered, major GCs start occurring
(sometimes reocuuring on the same process) and latency rises.
I use CMS GC and jdk 1.7.75

2. My previous post was about another use case, but nevertheless I have
configured docvalues in the faceted fields.


Toke Eskildsen wrote
> On Wed, 2015-05-06 at 00:58 -0700, adfel70 wrote:
>> each shard has around 200 million docs. size of each shard is 250GB.
>> this runs on 12 machines. each machine has 4 SSD disks and 4 solr
>> processes.
>> each process has 28GB heap.  each machine has 196GB RAM.
> 
> [...]
> 
>> 1. heavy GCs when soft commit is performed (methods 1,2) or when
>> hardcommit
>> opensearcher=true is performed. these GCs cause heavy latency (average
>> latency is 3 secs. latency during the problem is 80secs)
> 
> Sanity check: Are you sure the pauses are due to garbage collection?
> 
> You have a fairly large heap and judging from your previous post
> "problem with facets  - out of memory exception", you are doing
> non-trivial faceting. Are you using DocValues, as Marc suggested?
> 
> 
> - Toke Eskildsen, State and University Library, Denmark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068p4204088.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread Dmitry Kan
Do you seen any (a lot?) of the warming searchers on deck, i.e. value for N:

PERFORMANCE WARNING: Overlapping onDeckSearchers=N

On Wed, May 6, 2015 at 10:58 AM, adfel70  wrote:

> Hello
> I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
> documents.
> it currently has 3 billion documents overall (parent and children).
> each shard has around 200 million docs. size of each shard is 250GB.
> this runs on 12 machines. each machine has 4 SSD disks and 4 solr
> processes.
> each process has 28GB heap.  each machine has 196GB RAM.
>
> I perform periodic indexing throughout the day. each indexing cycle adds
> around 1.5 million docs. I keep the indexing load light - 2 processes with
> bulks of 20 docs.
>
> My use case demands that each indexing cycle will be visible only when the
> whole cycle finishes.
>
> I tried various methods of using soft and hard commits:
>
> 1. using auto hard commit with time=10secs (opensearcher=false) and an
> explicit soft commit when the indexing finishes.
> 2. using auto soft commit with time=10/30/60secs during the indexing.
> 3. not using soft commit at all, just using auto hard commit with
> time=10secs during the indexing (opensearcher=false) and an explicit hard
> commit with opensearcher=true when the cycle finishes.
>
>
> with all methods I encounter pretty much the same problem:
> 1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit
> opensearcher=true is performed. these GCs cause heavy latency (average
> latency is 3 secs. latency during the problem is 80secs)
> 2. if indexing cycles come too often, which causes softcommits or
> hardcommits(opensearcher=true) occur with a small interval one after
> another
> (around 5-10minutes), I start getting many OOM exceptions.
>
>
> Thank you.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread Toke Eskildsen
On Wed, 2015-05-06 at 00:58 -0700, adfel70 wrote:
> each shard has around 200 million docs. size of each shard is 250GB.
> this runs on 12 machines. each machine has 4 SSD disks and 4 solr processes.
> each process has 28GB heap.  each machine has 196GB RAM.

[...]

> 1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit
> opensearcher=true is performed. these GCs cause heavy latency (average
> latency is 3 secs. latency during the problem is 80secs)

Sanity check: Are you sure the pauses are due to garbage collection?

You have a fairly large heap and judging from your previous post
"problem with facets  - out of memory exception", you are doing
non-trivial faceting. Are you using DocValues, as Marc suggested?


- Toke Eskildsen, State and University Library, Denmark




Re: SolrCloud collection properties

2015-05-06 Thread Markus Heiden
We are currently having many custom properties defined in the
core.properties which are used in our solrconfig.xml, e.g.
 ${solr.enable.cachewarming:true}

Now we want to migrate to SolrCloud and want to define these properties for
a collection. But defining properties when creating a collection just
writes them into the core.properties of the created cores. This is a pain,
because we have a lot of properties and you have to specify each as an URL
parameter. Furthermore it seems that these properties are not propagated to
the cores for new shards, if you e.g. split a shard -> error-prone.

As you already mentioned, we could resolve this properties ourselves by
using many configsets instead of just one. My question was, if it is
possible to use just one configset in this case and specify collection
specific properties at the collection level? This seems for me the better
way to handle the configuration complexity.

Markus

2015-05-06 3:48 GMT+02:00 Erick Erickson :

> _What_ properties? Details matter
>
> And how do you do this now? Assuming you do this with separate conf
> directories, these are then just configsets in Zookeeper and you can
> have as many of them as you want. Problem here is that each one of
> them is a complete set of schema and config files, AFAIK the config
> set is the finest granularity that you have OOB.
>
> Best,
> Erick
>
> On Tue, May 5, 2015 at 6:55 AM, Markus Heiden 
> wrote:
> > Hi,
> >
> > we are trying to migrate from Solr 4.10 to SolrCloud 4.10. I understood
> > that SolrCloud uses collections as abstraction from the cores. What I am
> > missing is a possibility to store collection-specific properties in
> > Zookeeper. Using property.foo=bar in CREATE-URLs just sets core-specific
> > properties which are not distributed, e.g. if I migrate a shard from one
> > node to another.
> >
> > How do I define collection-specific properties (to be used in
> > solrconfig.xml and schema.xml) which get distributed with the collection
> to
> > all nodes?
> >
> > Why do I try that? Currently we have different cores which structure is
> > identical, but have each having some specific properties. I would like to
> > have a single configuration for them in Zookeeper from which I want to
> > create different collections, which just differ in the value of some
> > properties.
> >
> > Markus
>


New core on Solr Cloud

2015-05-06 Thread shacky
Hi.
This is my first experience with Solr Cloud.
I installed three Solr nodes with three ZooKeeper instances and they
seemed to start well.
Now I have to create a new replicated core and I'm trying to found out
how I can do it.
I found many examples about how to create shards and cores, but I have
to create one core with only one shard replicated on all three nodes
(so basically I want to have the same data on all three nodes).

Could you help me to understand what is the correct way to make this, please?

Thank you very much!
Bye


severe problems with soft and hard commits in a large index

2015-05-06 Thread adfel70
Hello
I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
documents.
it currently has 3 billion documents overall (parent and children).
each shard has around 200 million docs. size of each shard is 250GB.
this runs on 12 machines. each machine has 4 SSD disks and 4 solr processes.
each process has 28GB heap.  each machine has 196GB RAM.

I perform periodic indexing throughout the day. each indexing cycle adds
around 1.5 million docs. I keep the indexing load light - 2 processes with
bulks of 20 docs.

My use case demands that each indexing cycle will be visible only when the
whole cycle finishes.

I tried various methods of using soft and hard commits:

1. using auto hard commit with time=10secs (opensearcher=false) and an
explicit soft commit when the indexing finishes.
2. using auto soft commit with time=10/30/60secs during the indexing.
3. not using soft commit at all, just using auto hard commit with
time=10secs during the indexing (opensearcher=false) and an explicit hard
commit with opensearcher=true when the cycle finishes.


with all methods I encounter pretty much the same problem:
1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit
opensearcher=true is performed. these GCs cause heavy latency (average
latency is 3 secs. latency during the problem is 80secs)
2. if indexing cycles come too often, which causes softcommits or
hardcommits(opensearcher=true) occur with a small interval one after another
(around 5-10minutes), I start getting many OOM exceptions.


Thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-06 Thread Daniel Collins
Ah, I remember seeing this when we first started using Solr (which was 4.0
because we needed Solr Cloud), I never got around to filing an issue for it
(oops!), but we have a note in our schema to leave the key field a normal
string (like Bruno we had tried to lowercase it which failed).
We didn't really know Solr in those days, and hadn't really thought about
it since then, but Hoss' and Erick's explanations make perfect sense now!

Since shard routing is (basically) done on hashes of the unique key, if I
have 2 documents which are the "same", but have values "HELLO" and "hello",
they might well hash to completely different shards, so the update
logistics would be horrible.

Bruno, why do you need to lowercase at all then?  You said in your example,
that your client application always supplies "pn" and it is always
uppercase, so presumably all adds/updates could be done directly on that
field (as a normal string with no lowercasing).  Where does the case
insensitivity come in, is that only for searching?  If so couldn't you add
a search field (called id), and update your app to search using that (or
make that your default search field, I guess it depends if your calling app
explicitly uses the pn field name in its searches).


On 6 May 2015 at 01:55, Erick Erickson  wrote:

> Well, "working fine" may be a bit of an overstatement. That has never
> been officially supported, so it "just happened" to work in 3.6.
>
> As Chris points out, if you're using SolrCloud then this will _not_
> work as routing happens early in the process, i.e. before the analysis
> chain gets the token so various copies of the doc will exist on
> different shards.
>
> Best,
> Erick
>
> On Mon, May 4, 2015 at 4:19 PM, Bruno Mannina  wrote:
> > Hello Chris,
> >
> > yes I confirm on my SOLR3.6 it works fine since several years, and each
> doc
> > added with same code is updated not added.
> >
> > To be more clear, I receive docs with a field name "pn" and it's the
> > uniqueKey, and it always in uppercase
> >
> > so I must define in my schema.xml
> >
> >  > required="true" stored="true"/>
> >  indexed="true"
> > stored="false"/>
> > ...
> >id
> > ...
> >   
> >
> > but the application that use solr already exists so it requests with pn
> > field not id, i cannot change that.
> > and in each docs I receive, there is not id field, just pn field, and  i
> > cannot also change that.
> >
> > so there is a problem no ? I must import a id field and request a pn
> field,
> > but I have a pn field only for import...
> >
> >
> >
> > Le 05/05/2015 01:00, Chris Hostetter a écrit :
> >>
> >> : On SOLR3.6, I defined a string_ci field like this:
> >> :
> >> :  >> : sortMissingLast="true" omitNorms="true">
> >> : 
> >> :   
> >> :   
> >> : 
> >> : 
> >> :
> >> :  >> : required="true" stored="true"/>
> >>
> >>
> >> I'm really suprised that field would have worked for you (reliably) as a
> >> uniqueKey field even in Solr 3.6.
> >>
> >> the best practice for something like what you describe has always (going
> >> back to Solr 1.x) been to use a copyField to create a case insensitive
> >> copy of your uniqueKey for searching.
> >>
> >> if, for some reason, you really want case insensitve *updates* (so a doc
> >> with id "foo" overwrites a doc with id "FOO" then the only reliable way
> to
> >> make something like that work is to do the lowercassing in an
> >> UpdateProcessor to ensure it happens *before* the docs are distributed
> to
> >> the correct shard, and so the correct existing doc is overwritten (even
> if
> >> you aren't using solr cloud)
> >>
> >>
> >>
> >> -Hoss
> >> http://www.lucidworks.com/
> >>
> >>
> >
> >
> > ---
> > Ce courrier électronique ne contient aucun virus ou logiciel malveillant
> > parce que la protection avast! Antivirus est active.
> > http://www.avast.com
> >
>