Re: Edismax mm and efficiency

2014-09-17 Thread Mikhail Khludnev
On Fri, Sep 5, 2014 at 9:34 PM, Walter Underwood wun...@wunderwood.org
wrote:

 What would be a high mm value, 75%?


Walter, I suppose that the length of the search result influence the run
time. So, for particular query and  an index, the high mm value is that
one, which significantly reduces the search result length.


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Solr Suggestion not working in solr PLZ HELP

2014-09-17 Thread vaibhav.patil123
Suggestion
In solrconfig.xml:
searchComponent name=suggest class=solr.SuggestComponent
lst name=suggester
  str name=namemySuggester/str
  str name=lookupImplFuzzyLookupFactory/str   
  str name=dictionaryImplDocumentDictionaryFactory/str 
  str name=fieldcontent/str
  str name=weightField/str
  str name=suggestAnalyzerFieldTypestring/str
/lst
  /searchComponent
 
 requestHandler name=/suggest class=solr.SearchHandler startup=lazy
lst name=defaults
  str name=suggesttrue/str
  str name=suggest.count10/str
  str name=suggest.dictionarymySuggester/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler


--


Suggestion: localhost:28080/solr/suggest?q=foobat

above throwing exception as below 


responselst name=responseHeaderint name=status500/intint
name=QTime12/int/lstlst name=errorstr name=msgNo suggester
named default was configured/strstr
name=tracejava.lang.IllegalArgumentException: No suggester named default
was configured
at
org.apache.solr.handler.component.SuggestComponent.getSuggesters(SuggestComponent.java:353)
at
org.apache.solr.handler.component.SuggestComponent.prepare(SuggestComponent.java:158)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:149)
at
org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:145)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:559)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:336)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:856)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:653)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:926)
at java.lang.Thread.run(Thread.java:745)
/strint name=code500/int/lst/response 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Suggestion-not-working-in-solr-PLZ-HELP-tp4159351.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr Suggestion not working in solr PLZ HELP

2014-09-17 Thread Amey Jadiye
Hi Vaibhav,

Could you check with the directory *suggest.dictionary* mySuggester is present 
or not, try making it with mkdir, if still problem persist try giving full path.

I found good article  in below link check with that too. 
[http://romiawasthy.blogspot.com/2014/06/configure-solr-suggester.html]

Regards,Amey Date: Wed, 17 Sep 2014 00:03:33 -0700
 From: vaibhav.h.pa...@gmail.com
 To: solr-user@lucene.apache.org
 Subject: Solr Suggestion not working in solr PLZ HELP
 
 Suggestion
 In solrconfig.xml:
 searchComponent name=suggest class=solr.SuggestComponent
 lst name=suggester
   str name=namemySuggester/str
   str name=lookupImplFuzzyLookupFactory/str   
   str name=dictionaryImplDocumentDictionaryFactory/str 
   str name=fieldcontent/str
   str name=weightField/str
   str name=suggestAnalyzerFieldTypestring/str
 /lst
   /searchComponent
  
  requestHandler name=/suggest class=solr.SearchHandler startup=lazy
 lst name=defaults
   str name=suggesttrue/str
   str name=suggest.count10/str
   str name=suggest.dictionarymySuggester/str
 /lst
 arr name=components
   strsuggest/str
 /arr
   /requestHandler
 
 
 --
 
 
 Suggestion: localhost:28080/solr/suggest?q=foobat
 
 above throwing exception as below 
 
 
 responselst name=responseHeaderint name=status500/intint
 name=QTime12/int/lstlst name=errorstr name=msgNo suggester
 named default was configured/strstr
 name=tracejava.lang.IllegalArgumentException: No suggester named default
 was configured
   at
 org.apache.solr.handler.component.SuggestComponent.getSuggesters(SuggestComponent.java:353)
   at
 org.apache.solr.handler.component.SuggestComponent.prepare(SuggestComponent.java:158)
   at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197)
   at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
   at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
   at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
   at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
   at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
   at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
   at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230)
   at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:149)
   at
 org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169)
   at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:145)
   at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97)
   at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:559)
   at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102)
   at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:336)
   at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:856)
   at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:653)
   at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:926)
   at java.lang.Thread.run(Thread.java:745)
 /strint name=code500/int/lst/response 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Suggestion-not-working-in-solr-PLZ-HELP-tp4159351.html
 Sent from the Solr - User mailing list archive at Nabble.com.
  

Solr(j) API for manipulating the schema(.xml)?

2014-09-17 Thread Clemens Wyss DEV
Is there an API to manipulate/consolidate the schema(.xml) of a Solr-core? 
Through SolrJ? 

Context:
We already have a generic indexing/searching framework (based on lucene) where 
any component can act as a so called IndexDataPorvider. This provider delivers 
the field-types and also the entities to be (converted into documents and then) 
indexed. Each of these IndexProviders has ist own lucene index.
So we kind of have the information for the Solr schema.xml.

Hope the intention is clear. And yes the manipulation of the schema.xml is 
basically only needed when the field types change. Thats why I am looking for a 
way to consolidate the schema.xml (upon boot, initialization oft he 
IndexDataProviders ...). 
In 99,999% it won't change, But I'd like to keep the possibility of an 
IndexDataProvider to hand in its schema.

Also, again driven by the dynamic nature of our framework, can I easily create 
new cores over Sorj or the Solr-REST API ?


Problem deploying solr-4.10.0.war in Tomcat

2014-09-17 Thread phiroc


Hello,

I've dropped solr-4.10.0.war in Tomcat 7's webapp directory.

When I start the Java web server, the following message appears in catalina.out:

---

INFO: Starting Servlet Engine: Apache Tomcat/7.0.55
Sep 17, 2014 11:35:59 AM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive 
/archives/apache-tomcat-7.0.55_solr_8983/webapps/solr-4.10.0.war
Sep 17, 2014 11:35:59 AM org.apache.catalina.core.StandardContext startInternal
SEVERE: Error filterStart
Sep 17, 2014 11:35:59 AM org.apache.catalina.core.StandardContext startInternal
SEVERE: Context [/solr-4.10.0] startup failed due to previous errors

--

Any help would be much appreciated.

Cheers,

Philippe





RE: Problem deploying solr-4.10.0.war in Tomcat

2014-09-17 Thread Markus Jelsma
Yes, this is a nasty error. You have not set up logging libraries properly:
https://cwiki.apache.org/confluence/display/solr/Configuring+Logging
 
 
-Original message-
 From:phi...@free.fr phi...@free.fr
 Sent: Wednesday 17th September 2014 11:51
 To: solr-user@lucene.apache.org
 Subject: Problem deploying solr-4.10.0.war in Tomcat
 
 
 
 Hello,
 
 I've dropped solr-4.10.0.war in Tomcat 7's webapp directory.
 
 When I start the Java web server, the following message appears in 
 catalina.out:
 
 ---
 
 INFO: Starting Servlet Engine: Apache Tomcat/7.0.55
 Sep 17, 2014 11:35:59 AM org.apache.catalina.startup.HostConfig deployWAR
 INFO: Deploying web application archive 
 /archives/apache-tomcat-7.0.55_solr_8983/webapps/solr-4.10.0.war
 Sep 17, 2014 11:35:59 AM org.apache.catalina.core.StandardContext 
 startInternal
 SEVERE: Error filterStart
 Sep 17, 2014 11:35:59 AM org.apache.catalina.core.StandardContext 
 startInternal
 SEVERE: Context [/solr-4.10.0] startup failed due to previous errors
 
 --
 
 Any help would be much appreciated.
 
 Cheers,
 
 Philippe
 
 
 
 


Ping handler during initial wamup

2014-09-17 Thread Ere Maijala
As far as I can see, when a Solr instance is started (whether standalone 
or SolrCloud), a PingRequestHandler will wait until index warmup is 
complete before returning (at least with useColdSearcher=false) which 
may take a while. This poses a problem in that a load balancer either 
needs to wait for the result or employ a short timeout for timely 
failover. Of course the request is eventually served, but it would be 
better to be able to switch over to another server until warmup is complete.


So, is it possible to configure a ping handler to return quickly with 
non-OK status if a search handler is not yet available? This would allow 
the load balancer to quickly fail over to another server. I couldn't 
find anything like this in the docs, but I'm still hopeful.


I'm aware of the possibility of using a health state file, but I'd 
rather have a way of doing this automatically.


--Ere


solr 4.8 Tika stripping out all xml tags

2014-09-17 Thread keeblerh
I'm processing a zip file with an xml file.   The TikaEntityProcessor opens
the zip, reads the file but is stripping the xml tags even though I have
supplied the htmlMapper=identity attribute.  It maintains any html that is
contained in a CDATA section but seems to strip the other xml tags.   Is
this due to the recursive nature of opening the zip file?  Somehow that
identity value is lost?  My understanding is that this should work in this
version 4.8.  Thanks.  Below is my config info.

dataConfigdataSource type=BinFileDataSource /document
entity
name=kmlfiles dataSource=null rootEntity=false baseDir=mydirectory
fileName=.*\.kmz$ onError=skip processor=FileListEntityProcessor
recursive=false 
field defs
/
entity name=kmlImport processor=TikaEntityProcessor
datasource=kmlfiles htmlMapper=identity format=xml
transformer=TemplateTransformer url=${kmlfiles.fileAbsolutePath}
recursive=true
more field defs
/
  entity name=xml processor=XPathEntityProcessor ForEach=/kml
dataSource=fds
  dataField=kmlImport.text
  field xpath=//name column=name /
...more field defs
  /entity
/entity
/entity
/document/dataConfig



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-4-8-Tika-stripping-out-all-xml-tags-tp4159419.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 4.8 Tika stripping out all xml tags

2014-09-17 Thread keeblerh
Sorry...adding more information.

Note that it does wrap my data in html but it is after it strips all my xml
tags out.  So the data I am interested in parsing which would be 
namesomething/name
descriptionsomething/description
coordinates12345,12345,0/coordinates  

end up like p/n something /t/n something /n 12345,12345,0 etc.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-4-8-Tika-stripping-out-all-xml-tags-tp4159419p4159430.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to preserve 0 after decimal point?

2014-09-17 Thread Chris Hostetter

: second, and assuming your problem is really that you're looking at the
: _display_, you should get back exactly what you put in so I'm guessing

Not quite ... With the numeric types, the numeric value is both indexed 
and stored so that there is no search/sort inconsistency between 1.1, 
1.10, 001.1 etc... those are all the number 1.1 and are treated as such.


if you have an input string of that you want preserved verbatim, then you 
need to use a string type.   It doesn't matter if those strings *look* 
like numbers or not, if you consider 27.50 to be different from 27.5 
then those aren't numbers, they are strings.


-Hoss
http://www.lucidworks.com/


Re: Solr(j) API for manipulating the schema(.xml)?

2014-09-17 Thread Erick Erickson
Right, you can create new cores over the rest api.

As far as changing the schema, there's no good way to do that that I
know of programmatically. In the SolrCloud world, you can upload the
schema to ZooKeeper and have it automatically distributed to all the
nodes though.

Best,
Erick

On Wed, Sep 17, 2014 at 2:28 AM, Clemens Wyss DEV clemens...@mysign.ch wrote:
 Is there an API to manipulate/consolidate the schema(.xml) of a Solr-core? 
 Through SolrJ?

 Context:
 We already have a generic indexing/searching framework (based on lucene) 
 where any component can act as a so called IndexDataPorvider. This provider 
 delivers the field-types and also the entities to be (converted into 
 documents and then) indexed. Each of these IndexProviders has ist own lucene 
 index.
 So we kind of have the information for the Solr schema.xml.

 Hope the intention is clear. And yes the manipulation of the schema.xml is 
 basically only needed when the field types change. Thats why I am looking for 
 a way to consolidate the schema.xml (upon boot, initialization oft he 
 IndexDataProviders ...).
 In 99,999% it won't change, But I'd like to keep the possibility of an 
 IndexDataProvider to hand in its schema.

 Also, again driven by the dynamic nature of our framework, can I easily 
 create new cores over Sorj or the Solr-REST API ?


Re: How to preserve 0 after decimal point?

2014-09-17 Thread Erick Erickson
Really! Ya learn something new every day.

On Wed, Sep 17, 2014 at 10:48 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : second, and assuming your problem is really that you're looking at the
 : _display_, you should get back exactly what you put in so I'm guessing

 Not quite ... With the numeric types, the numeric value is both indexed
 and stored so that there is no search/sort inconsistency between 1.1,
 1.10, 001.1 etc... those are all the number 1.1 and are treated as such.


 if you have an input string of that you want preserved verbatim, then you
 need to use a string type.   It doesn't matter if those strings *look*
 like numbers or not, if you consider 27.50 to be different from 27.5
 then those aren't numbers, they are strings.


 -Hoss
 http://www.lucidworks.com/


Re: MaxScore

2014-09-17 Thread Peter Keegan
See if SOLR-5831 https://issues.apache.org/jira/browse/SOLR-5831 helps.

Peter

On Tue, Sep 16, 2014 at 11:32 PM, William Bell billnb...@gmail.com wrote:

 What we need is a function like scale(field,min,max) but only operates on
 the results that come back from the search results.

 scale() takes the min, max from the field in the index, not necessarily
 those in the results.

 I cannot think of a solution. max() only looks at one field, not across
 fields in the results.

 I tried a query() but cannot think of a way to get the max value of a field
 ONLY in the results...

 Ideas?


 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



Loading an index (generated by map reduce) in SolrCloud

2014-09-17 Thread KNitin
Hello

 I have generated a lucene index (with 6 shards) using Map Reduce. I want
to load this into a SolrCloud Cluster inside a collection.

Is there any out of the box way of doing this?  Any ideas are much
appreciated

Thanks
Nitin


How does KeywordRepeatFilterFactory help giving a higher score to an original term vs a stemmed term

2014-09-17 Thread Tom Burton-West
The Solr wiki says   A repeated question is how can I have the
original term contribute
more to the score than the stemmed version? In Solr 4.3, the
KeywordRepeatFilterFactory has been added to assist this
functionality. 

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming

(Full section reproduced below.)
I can see how in the example from the wiki reproduced below that both
the stemmed and original term get indexed, but I don't see how the
original term gets more weight than the stemmed term.  Wouldn't this
require a filter that gives terms with the keyword attribute more
weight?

What am I missing?

Tom



-
A repeated question is how can I have the original term contribute
more to the score than the stemmed version? In Solr 4.3, the
KeywordRepeatFilterFactory has been added to assist this
functionality. This filter emits two tokens for each input token, one
of them is marked with the Keyword attribute. Stemmers that respect
keyword attributes will pass through the token so marked without
change. So the effect of this filter would be to index both the
original word and the stemmed version. The 4 stemmers listed above all
respect the keyword attribute.

For terms that are not changed by stemming, this will result in
duplicate, identical tokens in the document. This can be alleviated by
adding the RemoveDuplicatesTokenFilterFactory.

fieldType name=text_keyword class=solr.TextField
positionIncrementGap=100
 analyzer
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.KeywordRepeatFilterFactory/
   filter class=solr.PorterStemFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
/fieldType


Re: How does KeywordRepeatFilterFactory help giving a higher score to an original term vs a stemmed term

2014-09-17 Thread Diego Fernandez
I'm not 100% on this, but I imagine this is what happens:

(using - to mean tokenized to)

Suppose that you index:

I am running home - am run running home

If you then query running home - run running home and thus give a higher 
score than if you query runs home - run runs home


- Original Message -
 The Solr wiki says   A repeated question is how can I have the
 original term contribute
 more to the score than the stemmed version? In Solr 4.3, the
 KeywordRepeatFilterFactory has been added to assist this
 functionality. 
 
 https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming
 
 (Full section reproduced below.)
 I can see how in the example from the wiki reproduced below that both
 the stemmed and original term get indexed, but I don't see how the
 original term gets more weight than the stemmed term.  Wouldn't this
 require a filter that gives terms with the keyword attribute more
 weight?
 
 What am I missing?
 
 Tom
 
 
 
 -
 A repeated question is how can I have the original term contribute
 more to the score than the stemmed version? In Solr 4.3, the
 KeywordRepeatFilterFactory has been added to assist this
 functionality. This filter emits two tokens for each input token, one
 of them is marked with the Keyword attribute. Stemmers that respect
 keyword attributes will pass through the token so marked without
 change. So the effect of this filter would be to index both the
 original word and the stemmed version. The 4 stemmers listed above all
 respect the keyword attribute.
 
 For terms that are not changed by stemming, this will result in
 duplicate, identical tokens in the document. This can be alleviated by
 adding the RemoveDuplicatesTokenFilterFactory.
 
 fieldType name=text_keyword class=solr.TextField
 positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.KeywordRepeatFilterFactory/
filter class=solr.PorterStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
 /fieldType
 

-- 
Diego Fernandez - 爱国
Software Engineer
GSS - Diagnostics



Re: Loading an index (generated by map reduce) in SolrCloud

2014-09-17 Thread Erick Erickson
Details please. You say MapReduce. Is this the
MapReduceIndexerTool? If so, you can use
the --go-live option to auto-merge them. Your
Solr instances need to be running over HDFS
though.

If you don't have Solr running over HDFS, you can
just copy the results for each shard to the right place.
What that means is that you must insure that the
shards produced via MRIT get copied to the corresponding
Solr local directory for each shard. If you put the wrong
one in the wrong place you'll have trouble with multiple
copies of documents showing up when you re-add any
doc that already exists in your Solr installation.

BTW, I'd surely stop all my Solr instances while copying
all this around.

Best,
Erick

On Wed, Sep 17, 2014 at 1:41 PM, KNitin nitin.t...@gmail.com wrote:
 Hello

  I have generated a lucene index (with 6 shards) using Map Reduce. I want
 to load this into a SolrCloud Cluster inside a collection.

 Is there any out of the box way of doing this?  Any ideas are much
 appreciated

 Thanks
 Nitin


Re: Loading an index (generated by map reduce) in SolrCloud

2014-09-17 Thread ralph tice
FWIW, I do a lot of moving Lucene indexes around and as long as the core is
unloaded it's never been an issue for Solr to be running at the same time.

If you move a core into the correct hierarchy for a replica, you can call
the Collections API's CREATESHARD action with the appropriate params (make
sure you use createNodeSet to point to the right server) and Solr will load
the index appropriately.  It's easiest to create a dummy shard and see
where data lands on your installation than to try to guess.

Ex:
PORT=8983
SHARD=myshard
COLLECTION=mycollection
SOLR_HOST=box1.mysolr.corp
curl http://
${SOLR_HOST}:${PORT}/solr/admin/collections?action=CREATESHARDshard=${SHARD}collection=${COLLECTION}createNodeSet=${SOLR_HOST}:${PORT}_solr

One file to watch out for if you are moving cores across machines/JVMs is
the core.properties file, which you don't want to duplicate to another
server/location when moving a data directory.  I don't recommend trying to
move transaction logs around either.


On Wed, Sep 17, 2014 at 5:22 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Details please. You say MapReduce. Is this the
 MapReduceIndexerTool? If so, you can use
 the --go-live option to auto-merge them. Your
 Solr instances need to be running over HDFS
 though.

 If you don't have Solr running over HDFS, you can
 just copy the results for each shard to the right place.
 What that means is that you must insure that the
 shards produced via MRIT get copied to the corresponding
 Solr local directory for each shard. If you put the wrong
 one in the wrong place you'll have trouble with multiple
 copies of documents showing up when you re-add any
 doc that already exists in your Solr installation.

 BTW, I'd surely stop all my Solr instances while copying
 all this around.

 Best,
 Erick

 On Wed, Sep 17, 2014 at 1:41 PM, KNitin nitin.t...@gmail.com wrote:
  Hello
 
   I have generated a lucene index (with 6 shards) using Map Reduce. I want
  to load this into a SolrCloud Cluster inside a collection.
 
  Is there any out of the box way of doing this?  Any ideas are much
  appreciated
 
  Thanks
  Nitin



Re: Implementing custom analyzer for multi-language stemming

2014-09-17 Thread roman-v1
If each token have a languageattribute on it, when I search by word and
language and if hightlighting is switched on, each word of sentence will be
highlighted. Because of it this solution not fit.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implementing-custom-analyzer-for-multi-language-stemming-tp4150156p4159550.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Loading an index (generated by map reduce) in SolrCloud

2014-09-17 Thread shushuai zhu
Hi, my case is a little simpler. For example, I have 100 collections now in my 
solr cloud, and I want to backup 20 of them so I can restore them later. I 
think I can just copy the index and log for each shard/core to another 
location, then delete the collections. Later, I can create new collections 
(likely with different names), then copy the index and log back to the right 
directory structure on the node. After that, I can either reload the collection 
or core.

However, some testing shows these do not work. I could not reload the 
collection or core. Have not tried re-starting the solr cloud. Can someone 
point out the best way to achieve the goal? I prefer not to re-start solr 
cloud. 

Shushuai
 


 From: ralph tice ralph.t...@gmail.com
To: solr-user@lucene.apache.org 
Sent: Wednesday, September 17, 2014 6:53 PM
Subject: Re: Loading an index (generated by map reduce) in SolrCloud
  

FWIW, I do a lot of moving Lucene indexes around and as long as the core is
unloaded it's never been an issue for Solr to be running at the same time.

If you move a core into the correct hierarchy for a replica, you can call
the Collections API's CREATESHARD action with the appropriate params (make
sure you use createNodeSet to point to the right server) and Solr will load
the index appropriately.  It's easiest to create a dummy shard and see
where data lands on your installation than to try to guess.

Ex:
PORT=8983
SHARD=myshard
COLLECTION=mycollection
SOLR_HOST=box1.mysolr.corp
curl http://
${SOLR_HOST}:${PORT}/solr/admin/collections?action=CREATESHARDshard=${SHARD}collection=${COLLECTION}createNodeSet=${SOLR_HOST}:${PORT}_solr

One file to watch out for if you are moving cores across machines/JVMs is
the core.properties file, which you don't want to duplicate to another
server/location when moving a data directory.  I don't recommend trying to
move transaction logs around either.





On Wed, Sep 17, 2014 at 5:22 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Details please. You say MapReduce. Is this the
 MapReduceIndexerTool? If so, you can use
 the --go-live option to auto-merge them. Your
 Solr instances need to be running over HDFS
 though.

 If you don't have Solr running over HDFS, you can
 just copy the results for each shard to the right place.
 What that means is that you must insure that the
 shards produced via MRIT get copied to the corresponding
 Solr local directory for each shard. If you put the wrong
 one in the wrong place you'll have trouble with multiple
 copies of documents showing up when you re-add any
 doc that already exists in your Solr installation.

 BTW, I'd surely stop all my Solr instances while copying
 all this around.

 Best,
 Erick

 On Wed, Sep 17, 2014 at 1:41 PM, KNitin nitin.t...@gmail.com wrote:
  Hello
 
   I have generated a lucene index (with 6 shards) using Map Reduce. I want
  to load this into a SolrCloud Cluster inside a collection.
 
  Is there any out of the box way of doing this?  Any ideas are much
  appreciated
 
  Thanks
  Nitin


Re: Ping handler during initial wamup

2014-09-17 Thread Shawn Heisey
On 9/17/2014 7:06 AM, Ere Maijala wrote:
 As far as I can see, when a Solr instance is started (whether
 standalone or SolrCloud), a PingRequestHandler will wait until index
 warmup is complete before returning (at least with
 useColdSearcher=false) which may take a while. This poses a problem in
 that a load balancer either needs to wait for the result or employ a
 short timeout for timely failover. Of course the request is eventually
 served, but it would be better to be able to switch over to another
 server until warmup is complete.

 So, is it possible to configure a ping handler to return quickly with
 non-OK status if a search handler is not yet available? This would
 allow the load balancer to quickly fail over to another server. I
 couldn't find anything like this in the docs, but I'm still hopeful.

 I'm aware of the possibility of using a health state file, but I'd
 rather have a way of doing this automatically.

If it's not horribly messy to implement, returning a non-OK status
immediately when there is no available searcher seems like a good idea. 
Please file an improvement issue in Jira.

This can be handled on the load balancer end by configuring a quick
timeout on load balancer health checks, and doing them very frequently.

I've got haproxy in front of my solr servers.  My checks happen every
five seconds, with a 4990 millisecond timeout.  My ping handler query
(defined in solrconfig.xml) is q=*:*rows=1 ... so it's very simple
and fast.  Because of efficiencies in the *:* query and caching, I doubt
this is putting much of a load on Solr.  It would probably be acceptable
to do the health checks once a second, although with typical Solr
logging configs you'd end up with a LOT of log data.  If you configure
logging at the WARN level, this would not be a worry.

Thanks,
Shawn



Re: Ping handler during initial wamup

2014-09-17 Thread Shawn Heisey
On 9/17/2014 8:07 PM, Shawn Heisey wrote:
 I've got haproxy in front of my solr servers.  My checks happen every
 five seconds, with a 4990 millisecond timeout.  My ping handler query
 (defined in solrconfig.xml) is q=*:*rows=1 ... so it's very simple
 and fast.  Because of efficiencies in the *:* query and caching, I doubt
 this is putting much of a load on Solr.  It would probably be acceptable
 to do the health checks once a second, although with typical Solr
 logging configs you'd end up with a LOT of log data.  If you configure
 logging at the WARN level, this would not be a worry.

At the URL below, you can see a trimmed version of my haproxy config.
I've got more than I show here, but this is the part that handles my
main Solr index:

http://apaste.info/0vk

The ncmain core is a core that has no index, with the shards parameter
built into the config, so the application has no idea that it's talking
to a sharded index that actually lives on two separate servers.

Thanks,
Shawn



Re: Loading an index (generated by map reduce) in SolrCloud

2014-09-17 Thread ralph tice
If you are updating or deleting from your indexes I don't believe it is
possible to get a consistent copy of the index from the file system
directly without monkeying with hard links.  The safest thing is to use the
ADDREPLICA command in the Collections API and then an UNLOAD from the CORE
API if you want to take the data offline.  If you don't care to use
additional servers/JVMs, you can use the replication handler to make backup
instead.

This older discussion covers most any backup strategy I can think of:
http://grokbase.com/t/lucene/solr-user/12c37h0g18/backing-up-solr-4-0

On Wed, Sep 17, 2014 at 9:01 PM, shushuai zhu ss...@yahoo.com.invalid
wrote:

 Hi, my case is a little simpler. For example, I have 100 collections now
 in my solr cloud, and I want to backup 20 of them so I can restore them
 later. I think I can just copy the index and log for each shard/core to
 another location, then delete the collections. Later, I can create new
 collections (likely with different names), then copy the index and log back
 to the right directory structure on the node. After that, I can either
 reload the collection or core.

 However, some testing shows these do not work. I could not reload the
 collection or core. Have not tried re-starting the solr cloud. Can someone
 point out the best way to achieve the goal? I prefer not to re-start solr
 cloud.

 Shushuai


 
  From: ralph tice ralph.t...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, September 17, 2014 6:53 PM
 Subject: Re: Loading an index (generated by map reduce) in SolrCloud


 FWIW, I do a lot of moving Lucene indexes around and as long as the core is
 unloaded it's never been an issue for Solr to be running at the same time.

 If you move a core into the correct hierarchy for a replica, you can call
 the Collections API's CREATESHARD action with the appropriate params (make
 sure you use createNodeSet to point to the right server) and Solr will load
 the index appropriately.  It's easiest to create a dummy shard and see
 where data lands on your installation than to try to guess.

 Ex:
 PORT=8983
 SHARD=myshard
 COLLECTION=mycollection
 SOLR_HOST=box1.mysolr.corp
 curl http://

 ${SOLR_HOST}:${PORT}/solr/admin/collections?action=CREATESHARDshard=${SHARD}collection=${COLLECTION}createNodeSet=${SOLR_HOST}:${PORT}_solr

 One file to watch out for if you are moving cores across machines/JVMs is
 the core.properties file, which you don't want to duplicate to another
 server/location when moving a data directory.  I don't recommend trying to
 move transaction logs around either.





 On Wed, Sep 17, 2014 at 5:22 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Details please. You say MapReduce. Is this the
  MapReduceIndexerTool? If so, you can use
  the --go-live option to auto-merge them. Your
  Solr instances need to be running over HDFS
  though.
 
  If you don't have Solr running over HDFS, you can
  just copy the results for each shard to the right place.
  What that means is that you must insure that the
  shards produced via MRIT get copied to the corresponding
  Solr local directory for each shard. If you put the wrong
  one in the wrong place you'll have trouble with multiple
  copies of documents showing up when you re-add any
  doc that already exists in your Solr installation.
 
  BTW, I'd surely stop all my Solr instances while copying
  all this around.
 
  Best,
  Erick
 
  On Wed, Sep 17, 2014 at 1:41 PM, KNitin nitin.t...@gmail.com wrote:
   Hello
  
I have generated a lucene index (with 6 shards) using Map Reduce. I
 want
   to load this into a SolrCloud Cluster inside a collection.
  
   Is there any out of the box way of doing this?  Any ideas are much
   appreciated
  
   Thanks
   Nitin
 



SolrCloud deleted all existing indexes after update query

2014-09-17 Thread Norgorn
I'm using SOLR-hs_0.06 based on SOLR 4.10
I have SolrCloud with external ZooKeepers.
I manually indexed with DIH from mySQL on each instance  - we have lot of
dbs, so It's one db per solr instace.
All was just fine - I could search and so on.
Then I sended update queries (lot of, about 1 or 100k or more) like this
- 192.168.1.1:8983/solr/mycollection/update/json + DATA in POST. IP addreses
where selected from pool, so there were many querires on each solr instance.
This queries perfomed well, but when I tried to search (after manually
commiting), I've seen only data added with update queries. 
All data from DIH was deleted. And data on disk was also deleted.
I can still see import result on dataimport page - but there is no data in
index.

There are no errors in logs. I just don't know what to do with that.

P.S. sory for my english, if it's bad.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-deleted-all-existing-indexes-after-update-query-tp4159566.html
Sent from the Solr - User mailing list archive at Nabble.com.