Re: Displaying highlights in formatted HTML document
--- On Thu, 6/9/11, Bryan Loofbourrow wrote: > From: Bryan Loofbourrow > Subject: Displaying highlights in formatted HTML document > To: solr-user@lucene.apache.org > Date: Thursday, June 9, 2011, 2:14 AM > Here is my use case: > > > > I have a large number of HTML documents, sizes in the > 0.5K-50M range, most > around, say, 10M. > > > > I want to be able to present the user with the formatted > HTML document, with > the hits tagged, so that he may iterate through them, and > see them in the > context of the document, with the document looking as it > would be presented > by a browser; that is, fully formatted, with its tables and > italics and font > sizes and all. > > > > This is something that the user would explicitly request > from within a set > of search results, not something I’d expect to have > returned from an initial > search – the initial search merely returns the snippets > around the hits. But > if the user wants to dive into one of the returned results > and see them in > context, I need to be able to go get that. > > > > We are currently solving this problem by using an entirely > separate search > engine (dtSearch), which performs the tagging of the hits > in the HTML just > fine. But the solution is unsatisfactory because there are > Solr searches > that dtSearch’s capabilities cannot reasonably match. > > > > Can anyone suggest a good way to use Solr/Lucene for this > instead? I’m > thinking a separate core for this purpose might make sense, > so as not to > burden the primary search core with the full contents of > the document. But > after that, I’m stuck. How can I get Solr to express the > highlighting in the > context of the formatted HTML document? > > > > If Solr does not do this currently, and anyone can suggest > ways to add the > feature, any tips on how this might best be incorporated > into the > implementation would be welcome. I am doing the same thing (solr trunk) using the following field type: In your separate core - which will is queried when the user wants to dive into one of the returned results - feed your html files in to this field. You may want to increase max analyzed chars too. 147483647
Code for getting distinct facet counts across shards(Distributed Process).
In solr 1.4.1, for getting "distinct facet terms count" across shards, The piece of code added for getting count of distinct facet terms across distributed process is as followed: Class: facetcomponent.java Function: -- finishStage(ResponseBuilder rb) for (DistribFieldFacet dff : fi.facets.values()) { //just after this line of code else { // TODO: log error or throw exception? counts = dff.getLexSorted(); int namedistint = 0; namedistint=rb.req.getParams().getFieldInt(dff.getKey().toString(),FacetParams.FACET_NAMEDISTINCT,0); if (namedistint == 0) facet_fields.add(dff.getKey(), fieldCounts); if (namedistint == 1) facet_fields.add("numfacetTerms", counts.length); if (namedistint == 2) { NamedList resCount = new NamedList(); resCount.add("numfacetTerms", counts.length); resCount.add("counts", fieldCounts); facet_fields.add(dff.getKey(), resCount); } Is this flow correct ? I have worked with few test cases and it has worked fine. but i want to know if there are any bugs that can creep in here? (My concern is this piece of code should not effect the rest of logic) *Code flow with comments for reference:* Function : -- finishStage(ResponseBuilder rb) //in this for loop , for (DistribFieldFacet dff : fi.facets.values()) { //just after this line of code else { // TODO: log error or throw exception? counts = dff.getLexSorted(); int namedistint = 0; //default //get the value of facet.numterms from the input query namedistint=rb.req.getParams().getFieldInt(dff.getKey().toString(),FacetParams.FACET_NAMEDISTINCT,0); // based on the value for facet.numterms==0 or 1 or 2 , if conditions //Get only facet field counts if (namedistint == 0) { facet_fields.add(dff.getKey(), fieldCounts); } //get only distinct facet term count if (namedistint == 1) { facet_fields.add("numfacetTerms", counts.length); } //get facet field count and distinct term count. if (namedistint == 2) { NamedList resCount = new NamedList(); resCount.add("numfacetTerms", counts.length); resCount.add("counts", fieldCounts); facet_fields.add(dff.getKey(), resCount); } Regards, Rajani On Fri, May 27, 2011 at 1:14 PM, rajini maski wrote: > No such issues . Successfully integrated with 1.4.1 and it works across > single index. > > for f.2.facet.numFacetTerms=1 parameter it will give the distinct count > result > > for f.2.facet.numFacetTerms=2 parameter it will give counts as well as > results for facets. > > But this is working only across single index not distributed process. The > conditions you have added in simple facet.java- "if namedistinct count ==int > " ( 0, 1 and 2 condtions).. Should it be added in distributed process > function to enable it work across shards? > > Rajani > > > > On Fri, May 27, 2011 at 12:33 PM, Bill Bell wrote: > >> I am pretty sure it does not yet support distributed shards.. >> >> But the patch was written for 4.0... So there might be issues with running >> it on 1.4.1. >> >> On 5/26/11 11:08 PM, "rajini maski" wrote: >> >> > The patch solr 2242 for getting count of distinct facet terms >> doesn't >> >work for distributedProcess >> > >> >(https://issues.apache.org/jira/browse/SOLR-2242) >> > >> >The error log says >> > >> > HTTP ERROR 500 >> >Problem accessing /solr/select. Reason: >> > >> >For input string: "numFacetTerms" >> > >> >java.lang.NumberFormatException: For input string: "numFacetTerms" >> >at >> >> >java.lang.NumberFormatException.forInputString(NumberFormatException.java: >> >48) >> >at java.lang.Long.parseLong(Long.java:403) >> >at java.lang.Long.parseLong(Long.java:461) >> >at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:331) >> >at org.apache.solr.schema.TrieField.toInternal(TrieField.java:344) >> >at >> >> >org.apache.solr.handler.component.FacetComponent$DistribFieldFacet.add(Fac >> >etComponent.java:619) >> >at >> >> >org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponen >> >t.java:265) >> >at >> >> >org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComp >> >onent.java:235) >> >at >> >> >org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa >> >ndler.java:290) >> >at >> >> >org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas >> >e.java:131) >> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) >> >at >> >> >org.apache.solr.servlet.Solr
Re: Query regarding Solr-2242 patch for getting distinct facet counts.
In solr 1.4.1, for getting "distinct facet terms count" across shards, The piece of code added for getting count of distinct facet terms across distributed process is as followed: Class: facetcomponent.java Function: -- finishStage(ResponseBuilder rb) for (DistribFieldFacet dff : fi.facets.values()) { //just after this line of code else { // TODO: log error or throw exception? counts = dff.getLexSorted(); int namedistint = 0; namedistint=rb.req.getParams().getFieldInt(dff.getKey().toString(),FacetParams.FACET_NAMEDISTINCT,0); if (namedistint == 0) facet_fields.add(dff.getKey(), fieldCounts); if (namedistint == 1) facet_fields.add("numfacetTerms", counts.length); if (namedistint == 2) { NamedList resCount = new NamedList(); resCount.add("numfacetTerms", counts.length); resCount.add("counts", fieldCounts); facet_fields.add(dff.getKey(), resCount); } Is this flow correct ? I have worked with few test cases and it has worked fine. but i want to know if there are any bugs that can creep in here? (My concern is this piece of code should not effect the rest of logic) *Code flow with comments for reference:* Function : -- finishStage(ResponseBuilder rb) //in this for loop , for (DistribFieldFacet dff : fi.facets.values()) { //just after this line of code else { // TODO: log error or throw exception? counts = dff.getLexSorted(); int namedistint = 0; //default //get the value of facet.numterms from the input query namedistint=rb.req.getParams().getFieldInt(dff.getKey().toString(),FacetParams.FACET_NAMEDISTINCT,0); // based on the value for facet.numterms==0 or 1 or 2 , if conditions //Get only facet field counts if (namedistint == 0) { facet_fields.add(dff.getKey(), fieldCounts); } //get only distinct facet term count if (namedistint == 1) { facet_fields.add("numfacetTerms", counts.length); } //get facet field count and distinct term count. if (namedistint == 2) { NamedList resCount = new NamedList(); resCount.add("numfacetTerms", counts.length); resCount.add("counts", fieldCounts); facet_fields.add(dff.getKey(), resCount); } Regards, Rajani On Fri, May 27, 2011 at 1:14 PM, rajini maski wrote: > No such issues . Successfully integrated with 1.4.1 and it works across > single index. > > for f.2.facet.numFacetTerms=1 parameter it will give the distinct count > result > > for f.2.facet.numFacetTerms=2 parameter it will give counts as well as > results for facets. > > But this is working only across single index not distributed process. The > conditions you have added in simple facet.java- "if namedistinct count ==int > " ( 0, 1 and 2 condtions).. Should it be added in distributed process > function to enable it work across shards? > > Rajani > > > > On Fri, May 27, 2011 at 12:33 PM, Bill Bell wrote: > >> I am pretty sure it does not yet support distributed shards.. >> >> But the patch was written for 4.0... So there might be issues with running >> it on 1.4.1. >> >> On 5/26/11 11:08 PM, "rajini maski" wrote: >> >> > The patch solr 2242 for getting count of distinct facet terms >> doesn't >> >work for distributedProcess >> > >> >(https://issues.apache.org/jira/browse/SOLR-2242) >> > >> >The error log says >> > >> > HTTP ERROR 500 >> >Problem accessing /solr/select. Reason: >> > >> >For input string: "numFacetTerms" >> > >> >java.lang.NumberFormatException: For input string: "numFacetTerms" >> >at >> >> >java.lang.NumberFormatException.forInputString(NumberFormatException.java: >> >48) >> >at java.lang.Long.parseLong(Long.java:403) >> >at java.lang.Long.parseLong(Long.java:461) >> >at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:331) >> >at org.apache.solr.schema.TrieField.toInternal(TrieField.java:344) >> >at >> >> >org.apache.solr.handler.component.FacetComponent$DistribFieldFacet.add(Fac >> >etComponent.java:619) >> >at >> >> >org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponen >> >t.java:265) >> >at >> >> >org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComp >> >onent.java:235) >> >at >> >> >org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa >> >ndler.java:290) >> >at >> >> >org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas >> >e.java:131) >> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) >> >at >> >> >org.apache.solr.servlet.Solr
Re: Tokenising based on known words?
On Thu, Jun 9, 2011 at 4:37 AM, Mark Mandel wrote: > Not sure if this possible, but figured I would ask the question. > > Basically, we have some users who do some pretty rediculous things ;o) > > Rather than writing "red jacket", they write "redjacket", which obviously > returns no results. [...] Have you tried using synonyms, http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory It seems like they should fit your use case. Regards, Gora
Multiple Values not getting Indexed
Hi I am trying to index 2 fields with multiple values. BUT, it is only putting 1 value for each & ignoring rest of the values after comma(,). I am fetching query through DIH. It works fine if i have only 1 value each of the 2 fields E.g. Field1 - 150,178,461,151,310,306,305,179,137,162 & Field2 - Chandigarh,Gurgaon,New Delhi,Ahmedabad,Rajkot,Surat,Mumbai,Nagpur,Pune,India - Others *Schema.xml* p.s. i tried multivalued=true but of no help. -- Thanks, Pawan Darira
Re: tika integration exception and other related queries
Hi Gary It started working .. though i did not test for Zip files, but for rar files, it is working fine .. only thing what i wanted to do is to index the metadata (text mapped to content) not store the data Also in search result, i want to filter the stuffs ... and it started working fine .. i don't want to show the content stuffs to the end user, since the way it extracts the information is not very helpful to the user .. although we can apply few of the analyzers and filters to remove the unnecessary tags ..still the information would not be of much help .. looking for your opinion ... what you did in order to filter out the content or are you showing the content extracted to the end user? Even in case, we are showing the text part to the end user, how can i limit the number of characters while querying the search results ... is there any feature where we can achieve this ... the concept of snippet kind of thing ... Thanks Naveen On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylor wrote: > Naveen, > > For indexing Zip files with Tika, take a look at the following thread : > > > http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html > > I got it to work with the 3.1 source and a couple of patches. > > Hope this helps. > > Regards, > Gary. > > > > On 08/06/2011 04:12, Naveen Gupta wrote: > >> Hi Can somebody answer this ... >> >> 3. can somebody tell me an idea how to do indexing for a zip file ? >> >> 1. while sending docx, we are getting following error. >> > >
Displaying highlights in formatted HTML document
Here is my use case: I have a large number of HTML documents, sizes in the 0.5K-50M range, most around, say, 10M. I want to be able to present the user with the formatted HTML document, with the hits tagged, so that he may iterate through them, and see them in the context of the document, with the document looking as it would be presented by a browser; that is, fully formatted, with its tables and italics and font sizes and all. This is something that the user would explicitly request from within a set of search results, not something I’d expect to have returned from an initial search – the initial search merely returns the snippets around the hits. But if the user wants to dive into one of the returned results and see them in context, I need to be able to go get that. We are currently solving this problem by using an entirely separate search engine (dtSearch), which performs the tagging of the hits in the HTML just fine. But the solution is unsatisfactory because there are Solr searches that dtSearch’s capabilities cannot reasonably match. Can anyone suggest a good way to use Solr/Lucene for this instead? I’m thinking a separate core for this purpose might make sense, so as not to burden the primary search core with the full contents of the document. But after that, I’m stuck. How can I get Solr to express the highlighting in the context of the formatted HTML document? If Solr does not do this currently, and anyone can suggest ways to add the feature, any tips on how this might best be incorporated into the implementation would be welcome. Thanks, -- Bryan
Re: FilterQuery and Ors
try fq=age:[1 TO 10] OR age:[10 TO 20] I'm pretty sure fq=age:([1 TO 10] OR [10 TO 20]) will work too. But you're right, multiple fq clauses are intersections, so specifying more than one fq clause on the SAME field results in what you're seeing. Best Erick On Wed, Jun 8, 2011 at 5:34 PM, Jamie Johnson wrote: > I'm looking for a way to do a filter query and Ors. I've done a bit of > googling and found an open jira but nothing indicating this is possible. > I'm looking to do something like the search at > http://www.lucidimagination.com/search/?q=test > where you can do multi selects for the facets. I've read about it at > http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParamsso > I have the tag/exclusion working but if I select two items from a > facet > group (say age from 1 to 10 and age from 10 to 20) I get nothing because > nothing meets both of those criteria. I can obviously write something > custom to build an OR out of this but that seems less elegant. Any guidance > would be appreciated >
Tokenising based on known words?
Not sure if this possible, but figured I would ask the question. Basically, we have some users who do some pretty rediculous things ;o) Rather than writing "red jacket", they write "redjacket", which obviously returns no results. Is there any way, with Solr, to go hunting for known words (maybe if there is no results) within the word set? Or even tokenise based on known words in the index? Last time I played with spell check suggestions, it didn't seem to handle this very well, but I've yet to try it again on 3.2.0 (just upgraded from 1.4.1). Any help/thoughts appreciated, as they do this al the time. Mark -- E: mark.man...@gmail.com T: http://www.twitter.com/neurotic W: www.compoundtheory.com cf.Objective(ANZ) - Nov 17, 18 - Melbourne Australia http://www.cfobjective.com.au Hands-on ColdFusion ORM Training www.ColdFusionOrmTraining.com
FilterQuery and Ors
I'm looking for a way to do a filter query and Ors. I've done a bit of googling and found an open jira but nothing indicating this is possible. I'm looking to do something like the search at http://www.lucidimagination.com/search/?q=test where you can do multi selects for the facets. I've read about it at http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParamsso I have the tag/exclusion working but if I select two items from a facet group (say age from 1 to 10 and age from 10 to 20) I get nothing because nothing meets both of those criteria. I can obviously write something custom to build an OR out of this but that seems less elegant. Any guidance would be appreciated
RE: Does MultiTerm highlighting work with the fastVectorHighlighter?
Hi Erick, Thanks for asking, yes we have termVectors=true set: I guess I should also mention that highlighting works fine using the fastVectorHighLighter as long as we don't do a MultiTerm query. For example see the query and results appended below (using the same hl parameters listed in the previous email) Tom ocr:tinkham − − − John {lt:}b style="background:#00"{gt:}Tinkham{lt:}/b{gt:}, who married Miss Mallie Kingsbury; Mr. William Ash- ley, and Mr. Leavitt, who, I believe, built the big stone house, now left high and dry by itself, on the top of Lyon street hill. As -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, June 08, 2011 4:56 PM To: solr-user@lucene.apache.org Subject: Re: Does MultiTerm highlighting work with the fastVectorHighlighter? Just to check, does the field have termVectors="true" set? I think it's required for FVH to work. Best Erick
Re: Does MultiTerm highlighting work with the fastVectorHighlighter?
Just to check, does the field have termVectors="true" set? I think it's required for FVH to work. Best Erick On Wed, Jun 8, 2011 at 3:24 PM, Burton-West, Tom wrote: > We are trying to implement highlighting for wildcard (MultiTerm) queries. > This seems to work find with the regular highlighter but when we try to use > the fastVectorHighlighter we don't see any results in the highlighting > section of the response. Appended below are the parameters we are using. > > Tom Burton-West > > query > ocr:tink* > highlighting params: > > true > 200 > true > 200 > colored > simple > ocr > true > true > >
Re: wildcard search
> > I don't use it myself (but I will soon), so I > may be wrong, but did you try > > to use the ComplexPhraseQueryParser : > > > > ComplexPhraseQueryParser > > QueryParser which > permits complex phrase query syntax eg "(john > > jon jonathan~) peters*". > > > > It seems that you could do such type of queries : > > > > GOK:"IA 38*" > > yes that sounds interesting. > But I don't know how to get and install it into solr. Cam > you give me a hint? https://issues.apache.org/jira/browse/SOLR-1604 But it seems that you can achieve what you want with vanilla solr. I don't follow the multivalued part in your example but you can tokenize "IA 300; IC 330; IA 317; IA 318" into these 4 tokens IA 300 IC 330 IA 314 IA 318 Using Pattern Tokenizer Factory. And you can use PrefixQParserPlugin for searching. http://lucene.apache.org/solr/api/org/apache/solr/search/PrefixQParserPlugin.html
Does MultiTerm highlighting work with the fastVectorHighlighter?
We are trying to implement highlighting for wildcard (MultiTerm) queries. This seems to work find with the regular highlighter but when we try to use the fastVectorHighlighter we don't see any results in the highlighting section of the response. Appended below are the parameters we are using. Tom Burton-West query ocr:tink* highlighting params: true 200 true 200 colored simple ocr true true
RE: huge shards (300GB each) and load balancing
Hi Dmitry, I am assuming you are splitting one very large index over multiple shards rather than replicating and index multiple times. Just for a point of comparison, I thought I would describe our experience with large shards. At HathiTrust, we run a 6 terabyte index over 12 shards. This is split over 4 machines with 3 shards per machine and our shards are about 400-500GB. We get average response times of around 200 ms with the 99th percentile queries up around 1-2 seconds. We have a very low qps rate, i.e. less than 1 qps. We also index offline on a separate machine and update the indexes nightly. Some of the issues we have found with very large shards are: 1) Becaue of the very large shard size, I/O tends to be the bottleneck, with phrase queries containing common words being the slowest. 2) Because of the I/O issues running cache-warming queries to get postings into the OS disk cache is important as is leaving significant free memory for the OS to use for disk caching 3) Because of the I/O issues using stop words or CommonGrams produces a significant performance increase. 2) We have a huge number of unique terms in our indexes. In order to reduce the amount of memory needed by the in-memory terms index we set the termInfosIndexDivisor to 8, which causes Solr to only load every 8th term from the tii file into memory. This reduced memory use from over 18GB to below 3G and got rid of 30 second stop the world java Garbage Collections. (See http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again for details) We later ran into memory problems when indexing so instead changed the index time parameter termIndexInterval from 128 to 1024. (More details here: http://www.hathitrust.org/blogs/large-scale-search) Tom Burton-West
Re: wildcard search
Hi Ludovic, > I don't use it myself (but I will soon), so I may be wrong, but did you try > to use the ComplexPhraseQueryParser : > > ComplexPhraseQueryParser > QueryParser which permits complex phrase query syntax eg "(john > jon jonathan~) peters*". > > It seems that you could do such type of queries : > > GOK:"IA 38*" yes that sounds interesting. But I don't know how to get and install it into solr. Cam you give me a hint? Thanks Thomas
Re: solr 3.1 java.lang.NoClassDEfFoundError org/carrot2/core/ControllerFactory
Hi Bryan, You'll also need to make sure the your ${solr.dir}/contrib/clustering/lib directory is in the classpath; that directory contains the Carrot2 JARs that provide the classes you're missing. I think the example solrconfig.xml has the relevant declarations. Cheers, S. On Tue, Jun 7, 2011 at 13:48, bryan rasmussen wrote: > As per the subject I am getting java.lang.NoClassDEfFoundError > org/carrot2/core/ControllerFactory > when I try to run clustering. > > I am using Solr 3.1: > > I get the following error: > > java.lang.NoClassDefFoundError: org/carrot2/core/ControllerFactory >at > org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.(CarrotClusteringEngine.java:74) >at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) >at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown > Source) >at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown > Source) >at java.lang.reflect.Constructor.newInstance(Unknown Source) >at java.lang.Class.newInstance0(Unknown Source) >at java.lang.Class.newInstance(Unknown Source) >at > org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:412) >at > org.apache.solr.handler.clustering.ClusteringComponent.inform(ClusteringComponent.java:203) >at > org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:522) >at org.apache.solr.core.SolrCore.(SolrCore.java:594) >at org.apache.solr.core.CoreContainer.create(CoreContainer.java:458) >at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) >at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) >at > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130) >at > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) >at > org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) >at > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >at > org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) >at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) >at > org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) >at > org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) >at > org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) >at > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >at > org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) >at > org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) >at > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >at > org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) >at > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >at > org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) >at org.mortbay.jetty.Server.doStart(Server.java:224) >at > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) >at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) >at java.lang.reflect.Method.invoke(Unknown Source) >at org.mortbay.start.Main.invokeMain(Main.java:194) >at org.mortbay.start.Main.start(Main.java:534) >at org.mortbay.start.Main.start(Main.java:441) >at org.mortbay.start.Main.main(Main.java:119) > Caused by: java.lang.ClassNotFoundException: > org.carrot2.core.ControllerFactory >at java.net.URLClassLoader$1.run(Unknown Source) >at java.security.AccessController.doPrivileged(Native Method) >at java.net.URLClassLoader.findClass(Unknown Source) >at java.lang.ClassLoader.loadClass(Unknown Source) >at java.net.FactoryURLClassLoader.loadClass(Unknown Source) > > using the following configuration > > > class="org.apache.solr.handler.clustering.ClusteringComponent" > name="clustering"> > >default > name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm > > >20 > > > class="org.apache.solr.handler.component.SearchHandler"> > > explicit > > >title >all_text >all_text title > > 150 > > >clustering > > > > > > with the following command to start solr > java -Dsolr.clustering.enabled=true > -Dsolr.solr.home="C:\projects\solrexample\solr" -jar start.jar > > Any idea as to why crusty is not working? > > Thanks, > Bryan Rasmussen >
Re: huge shards (300GB each) and load balancing
Hi, Bill. Thanks, always nice to have options! Dmitry On Wed, Jun 8, 2011 at 4:47 PM, Bill Bell wrote: > Re Amazon elb. > > This is not exactly true. The ELB does load balancer internal IPs. But the > ELB IP address must be external. Still a major issue unless you use > authentication. Nginx and others can also do load balancing. > > Bill Bell > Sent from mobile > > > On Jun 8, 2011, at 3:32 AM, "Upayavira" wrote: > > > > > > > On Wed, 08 Jun 2011 10:42 +0300, "Dmitry Kan" > > wrote: > >> Hello list, > >> > >> Thanks for attending to my previous questions so far, have learnt a lot. > >> Here is another one, I hope it will be interesting to answer. > >> > >> > >> > >> We run our SOLR shards and front end SOLR on the Amazon high-end > >> machines. > >> Currently we have 6 shards with around 200GB in each. Currently we have > >> only > >> one front end SOLR which, given a client query, redirects it to all the > >> shards. Our shards are constantly growing, data is at times reindexed > (in > >> batches, which is done by removing a decent chunk before replacing it > >> with > >> updated data), constant stream of new data is coming every hour (usually > >> hits the latest shard in time, but can also hit other shards, which have > >> older data). Since the front end SOLR has started to be a SPOF, we are > >> thinking about setting up some sort of load balancer. > >> > >> 1) do you think ELB from Amazon is a good solution for starters? We > don't > >> need to maintain sessions between SOLR and client. > >> 2) What other load balancers have been used specifically with SOLR? > >> > >> > >> Overall: does SOLR scale to such size (200GB in an index) and what can > be > >> recommended as next step -- resharding (cutting existing shards to > >> smaller > >> chunks), replication? > > > > Really, it is going to be up to you to work out what works in your > > situation. You may be reaching the limit of what a Lucene index can > > handle, don't know. If your query traffic is low, you might find that > > two 100Gb cores in a single instance performs better. But then, maybe > > not! Or two 100Gb shards on smaller Amazon hosts. But then, maybe not! > > :-) > > > > The principal issue with Amazon's load balancers (at least when I was > > using them last year) is that the ports that they balance need to be > > public. You can't use an Amazon load balancer as an internal service > > within a security group. For a service such as Solr, that can be a bit > > of a killer. > > > > If they've fixed that issue, then they'd work fine (I used them quite > > happily in another scenario). > > > > When looking at resolving single points of failure, handling search is > > pretty easy (as you say, stateless load balancer). You will need to give > > more attention though to how you handle it regarding indexing. > > > > Hope that helps a bit! > > > > Upayavira > > > > > > > > > > > > --- > > Enterprise Search Consultant at Sourcesense UK, > > Making Sense of Open Source > > > -- Regards, Dmitry Kan
Re: Sorting on solr.TextField
On Wed, Jun 8, 2011 at 1:21 PM, Jamie Johnson wrote: > Thanks exactly what I was looking for. > > With this new field used just for sorting is there a way to have it be case > insensitive? >From the example schema: -Yonik http://www.lucidimagination.com
Re: Sorting on solr.TextField
Thanks exactly what I was looking for. With this new field used just for sorting is there a way to have it be case insensitive? On Wed, Jun 8, 2011 at 12:50 PM, Ahmet Arslan wrote: > > Is there any documentation which > > details sorting behaviors on the different > > types of solr fields? My question is specifically > > about solr.TextField but > > I'd just like to know in general at this point. > > > http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F >
Re: Sorting on solr.TextField
> Is there any documentation which > details sorting behaviors on the different > types of solr fields? My question is specifically > about solr.TextField but > I'd just like to know in general at this point. http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F
Sorting on solr.TextField
Is there any documentation which details sorting behaviors on the different types of solr fields? My question is specifically about solr.TextField but I'd just like to know in general at this point. Currently when executing a query and I say to sort on a text field I am getting results as follows: Beth Cross Beth Cross Caroline Cross Arlene Cross Calvin Cross Brett Cross Brandon Cross Beth Cross Beth Cross Caroline Cross where I would have expected Arlene Cross Beth Cross Beth Cross Brett Cross Beth Cross Beth Cross Brandon Cross Calvin Cross Caroline Cross Caroline Cross
Re: KeywordTokenizerFactory and stopwords
Hi Erik. Yes something like what you describe would do the trick. I did find this: http://lucene.472066.n3.nabble.com/Concatenate-multiple-tokens-into-one-td1879611.html I might try the pattern replace filter with stopwords, even though that feels kinda clunky. Matt On Wed, Jun 8, 2011 at 11:04 AM, Erik Hatcher wrote: > This seems like it deserves some kind of "collecting" TokenFilter(Factory) > that will slurp up all incoming tokens and glue them together with a space > (and allow separator to be configurable). Hmmm surprised one of those > doesn't already exist. With something like that you could have a standard > tokenization chain, and put it all back together at the end. > > Erik > > On Jun 8, 2011, at 10:59 , Matt Mitchell wrote: > >> Hi, >> >> I have an "autocomplete" fieldType that works really well, but because >> the KeywordTokenizerFactory (if I understand correctly) is emitting a >> single token, the stopword filter will not detect any stopwords. >> Anyone know of a way to strip out stopwords when using >> KeywordTokenizerFactory? I did try the reg-exp replace filter, but I'm >> not sure I want to add a bunch of reg-exps for replacing every >> stopword. >> >> Thanks, >> Matt >> >> Here's the fieldType definition: >> >> > positionIncrementGap="100"> >> >> >> >> >> >> >> > maxGramSize="50"/> >> >> >> >> >> >> >> >> > >
Re: solr index losing entries
We have an API built in 2007 which at the lowest level submits requests with . We haven't changed anything to the API, and it worked well until the beginning of this year. Unique key is solr_id with this definition: The number of documents is determined using this HTTP request: http://server/app_name/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on Thanks, Marius 2011/6/8 Tomás Fernández Löbbe > That's rare. How do you add documents to Solr? what do you have as primary > key? > How do you determine the number of documents in the index? > > The value of "maxDoc" of the stats page considers deleted documents too, > which are eliminated at merging. > > On Wed, Jun 8, 2011 at 12:18 PM, Marius Hanganu > wrote: > > > Hello, > > > > We've been using for 1.5 years now solr 1.4 for one of the indexes in our > > application with a special configuration with maxDocs=1 and maxTime=1. > The > > number of documents is 10.000, with index size around 10MB. > > > > For a few monhts now, SOLR has this strange behavior. Our code did not > > change, however, documents started disappearing from the index. And it's > > decreasing constantly, at various speeds. > > > > Our first modification was to raise maxTime to 30sec, which seemed to fix > > the problem for some time. After a few weeks, the problem started showing > > again, so we've upgraded to SOLR 3.1. > > > > This upgrade did not fix the problem either. Our last try was with > > maxDocs=500 and maxTime=60sec. After a complete reindex, SOLR shows all > > 10081 documents, but after a few minutes, it suddenly goes down to 10078, > > after a few hours to 10076, and it's stabilizing around this number. > > > > We have another SOLR index with ~400.000 objects, maxDocs=1000 and > > maxTime=3 > > minutes and it never showed this problem. > > > > Do you have any idea why this is happening? Or how we can identify the > > problem? > > > > Thanks, > > Marius > > >
Re: solr index losing entries
That's rare. How do you add documents to Solr? what do you have as primary key? How do you determine the number of documents in the index? The value of "maxDoc" of the stats page considers deleted documents too, which are eliminated at merging. On Wed, Jun 8, 2011 at 12:18 PM, Marius Hanganu wrote: > Hello, > > We've been using for 1.5 years now solr 1.4 for one of the indexes in our > application with a special configuration with maxDocs=1 and maxTime=1. The > number of documents is 10.000, with index size around 10MB. > > For a few monhts now, SOLR has this strange behavior. Our code did not > change, however, documents started disappearing from the index. And it's > decreasing constantly, at various speeds. > > Our first modification was to raise maxTime to 30sec, which seemed to fix > the problem for some time. After a few weeks, the problem started showing > again, so we've upgraded to SOLR 3.1. > > This upgrade did not fix the problem either. Our last try was with > maxDocs=500 and maxTime=60sec. After a complete reindex, SOLR shows all > 10081 documents, but after a few minutes, it suddenly goes down to 10078, > after a few hours to 10076, and it's stabilizing around this number. > > We have another SOLR index with ~400.000 objects, maxDocs=1000 and > maxTime=3 > minutes and it never showed this problem. > > Do you have any idea why this is happening? Or how we can identify the > problem? > > Thanks, > Marius >
Re: wildcard search
Hi Thomas, I don't use it myself (but I will soon), so I may be wrong, but did you try to use the ComplexPhraseQueryParser : ComplexPhraseQueryParser QueryParser which permits complex phrase query syntax eg "(john jon jonathan~) peters*". It seems that you could do such type of queries : GOK:"IA 38*" Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/memory-leak-during-undeploying-tp2620093p3039561.html Sent from the Solr - User mailing list archive at Nabble.com.
solr index losing entries
Hello, We've been using for 1.5 years now solr 1.4 for one of the indexes in our application with a special configuration with maxDocs=1 and maxTime=1. The number of documents is 10.000, with index size around 10MB. For a few monhts now, SOLR has this strange behavior. Our code did not change, however, documents started disappearing from the index. And it's decreasing constantly, at various speeds. Our first modification was to raise maxTime to 30sec, which seemed to fix the problem for some time. After a few weeks, the problem started showing again, so we've upgraded to SOLR 3.1. This upgrade did not fix the problem either. Our last try was with maxDocs=500 and maxTime=60sec. After a complete reindex, SOLR shows all 10081 documents, but after a few minutes, it suddenly goes down to 10078, after a few hours to 10076, and it's stabilizing around this number. We have another SOLR index with ~400.000 objects, maxDocs=1000 and maxTime=3 minutes and it never showed this problem. Do you have any idea why this is happening? Or how we can identify the problem? Thanks, Marius
Re: KeywordTokenizerFactory and stopwords
This seems like it deserves some kind of "collecting" TokenFilter(Factory) that will slurp up all incoming tokens and glue them together with a space (and allow separator to be configurable). Hmmm surprised one of those doesn't already exist. With something like that you could have a standard tokenization chain, and put it all back together at the end. Erik On Jun 8, 2011, at 10:59 , Matt Mitchell wrote: > Hi, > > I have an "autocomplete" fieldType that works really well, but because > the KeywordTokenizerFactory (if I understand correctly) is emitting a > single token, the stopword filter will not detect any stopwords. > Anyone know of a way to strip out stopwords when using > KeywordTokenizerFactory? I did try the reg-exp replace filter, but I'm > not sure I want to add a bunch of reg-exps for replacing every > stopword. > > Thanks, > Matt > > Here's the fieldType definition: > > positionIncrementGap="100"> > > > > > > > maxGramSize="50"/> > > > > > > > >
KeywordTokenizerFactory and stopwords
Hi, I have an "autocomplete" fieldType that works really well, but because the KeywordTokenizerFactory (if I understand correctly) is emitting a single token, the stopword filter will not detect any stopwords. Anyone know of a way to strip out stopwords when using KeywordTokenizerFactory? I did try the reg-exp replace filter, but I'm not sure I want to add a bunch of reg-exps for replacing every stopword. Thanks, Matt Here's the fieldType definition:
Re: wildcard search
Hmmm, have you tried EdgeNGrams? This works for me (at the expense of a somewhat larger index, of course)... and a field of type "edge" named "thomasfield" Now searches like thomasfield:"GOK IA 3" (include quotes!) should work. The various parameters (min/max gram size) I chose arbitrarily, you'll want to tweak them. I include a lowercasefilter for safety's sake if people are actually going to type things in... It's probably instructive to look at the admin/analysis page to see how this all plays out Best Erick On Wed, Jun 8, 2011 at 9:29 AM, Thomas Fischer wrote: > Hi Erick, > > I have a multivalued field "GOK" (local classification scheme) with separate > entries of the sort > IA 300; IC 330; IA 317; IA 318, i.e. 1 to 3 capital characters, space, 3 > digits. > I want to be able to perform a truncated search on that field: > either just the string before the space, or a combination of that string with > 1 or 2 digits, something like: > GOK:IA > or > GOK:IA 3* > or > GOK:IA 31? > My problem is the clash between the phrase (GOK:"IA 317" works) and the > wildcards. > > As a start I tried as type > autoGeneratePhraseQueries="true"> > from the solr 3.2 distribution schema > (apache-solr-3.2.0/example/solr/conf/schema.xml), > the field is just > > > BTW, I have another field "DDC" with entries of the form "t1:086643" with > analogous requirements which yields similar problems due to the colon, also > indexed as text. > Here also > DDC:T1\:086643 > works, but not > DDC:T1\:08664? > > Thanks in advance > Thomas > >> Yes there is, but you haven't provided enough information to >> make a suggestion. What isthe fieldType definition? What is >> the field definition? >> >> Two resources that'll help you greatly are: >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters >> >> and the admin/analysis page... >> >> Best >> Erick >> >> On Tue, Jun 7, 2011 at 6:23 PM, Thomas Fischer wrote: >>> Hello, >>> >>> I am testing solr 3.2 and have problems with wildcards. >>> I am indexing values like "IA 300; IC 330; IA 317; IA 318" in a field >>> "GOK", and can't find a way to search with wildcards. >>> I want to use a wild card search to match something like "IA 31?" but >>> cannot find a way to do so. >>> GOK:IA\ 38* doesn't work with the contents of GOK indexed as text. >>> Is there a way to index and search that would meet my requirements? >>> >>> Thomas >>> >>> >>> > > Mit freundlichen Grüßen > Thomas Fischer > > >
Re: Problem with boosting function
try: q=title:Unicamp&defType=dismax&bf=question_count^5.0 "title:Unicamp" in any search handler will search only in requested field > The queries I am trying to do are > q=title:Unicamp > and > q=title:Unicamp&bf=question_count^5.0 > The boosting factor (5.0) is just to verify if it was really used. > Thanks > Alex
Re: Problem with boosting function
The queries I am trying to do are q=title:Unicamp and q=title:Unicamp&bf=question_count^5.0 The boosting factor (5.0) is just to verify if it was really used. Thanks Alex On Wed, Jun 8, 2011 at 10:25 AM, Denis Kuzmenok wrote: > Show your full request to solr (all params) > > > Hi, > > I'm trying to use bf parameter in solr queries but I'm having some > problems. > > > The context is: I have some topics and a integer weight of popularity > > (number of users that follow the topic). I'd like to boost the documents > > according to this weight field, and it changes (users may start following > or > > unfollowing that topic). I through the best way to do that is adding a bf > > parameter to the query. > > > First of all I was trying to include it in a query processed by a default > > SearchHandler. I debugged the results and the scores didn't change. So I > > tried to change the defType of the SearchHandler to dismax (I didn't add > any > > other field in solrconfig), and queries didn't work anymore. > > > What is the best way to achieve what I want? Do I really need to use a > > dismax SearchHander (I read about it, and I don't want to search in > multple > > fields - I want to search in one field and boost in another one)? > > > Thanks in advance > > > Alex Grilo > > >
Re: huge shards (300GB each) and load balancing
Re Amazon elb. This is not exactly true. The ELB does load balancer internal IPs. But the ELB IP address must be external. Still a major issue unless you use authentication. Nginx and others can also do load balancing. Bill Bell Sent from mobile On Jun 8, 2011, at 3:32 AM, "Upayavira" wrote: > > > On Wed, 08 Jun 2011 10:42 +0300, "Dmitry Kan" > wrote: >> Hello list, >> >> Thanks for attending to my previous questions so far, have learnt a lot. >> Here is another one, I hope it will be interesting to answer. >> >> >> >> We run our SOLR shards and front end SOLR on the Amazon high-end >> machines. >> Currently we have 6 shards with around 200GB in each. Currently we have >> only >> one front end SOLR which, given a client query, redirects it to all the >> shards. Our shards are constantly growing, data is at times reindexed (in >> batches, which is done by removing a decent chunk before replacing it >> with >> updated data), constant stream of new data is coming every hour (usually >> hits the latest shard in time, but can also hit other shards, which have >> older data). Since the front end SOLR has started to be a SPOF, we are >> thinking about setting up some sort of load balancer. >> >> 1) do you think ELB from Amazon is a good solution for starters? We don't >> need to maintain sessions between SOLR and client. >> 2) What other load balancers have been used specifically with SOLR? >> >> >> Overall: does SOLR scale to such size (200GB in an index) and what can be >> recommended as next step -- resharding (cutting existing shards to >> smaller >> chunks), replication? > > Really, it is going to be up to you to work out what works in your > situation. You may be reaching the limit of what a Lucene index can > handle, don't know. If your query traffic is low, you might find that > two 100Gb cores in a single instance performs better. But then, maybe > not! Or two 100Gb shards on smaller Amazon hosts. But then, maybe not! > :-) > > The principal issue with Amazon's load balancers (at least when I was > using them last year) is that the ports that they balance need to be > public. You can't use an Amazon load balancer as an internal service > within a security group. For a service such as Solr, that can be a bit > of a killer. > > If they've fixed that issue, then they'd work fine (I used them quite > happily in another scenario). > > When looking at resolving single points of failure, handling search is > pretty easy (as you say, stateless load balancer). You will need to give > more attention though to how you handle it regarding indexing. > > Hope that helps a bit! > > Upayavira > > > > > > --- > Enterprise Search Consultant at Sourcesense UK, > Making Sense of Open Source >
Re: Problem with boosting function
The boost qparser should do the trick if you want a multiplicative boost. http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html -Yonik http://www.lucidimagination.com On Wed, Jun 8, 2011 at 9:22 AM, Alex Grilo wrote: > Hi, > I'm trying to use bf parameter in solr queries but I'm having some problems. > > The context is: I have some topics and a integer weight of popularity > (number of users that follow the topic). I'd like to boost the documents > according to this weight field, and it changes (users may start following or > unfollowing that topic). I through the best way to do that is adding a bf > parameter to the query. > > First of all I was trying to include it in a query processed by a default > SearchHandler. I debugged the results and the scores didn't change. So I > tried to change the defType of the SearchHandler to dismax (I didn't add any > other field in solrconfig), and queries didn't work anymore. > > What is the best way to achieve what I want? Do I really need to use a > dismax SearchHander (I read about it, and I don't want to search in multple > fields - I want to search in one field and boost in another one)? > > Thanks in advance > > Alex Grilo >
Re: Solr Cloud and Range Facets
One last piece of informationregular range queries seem to work fine, it's only date ranges which seem to be intermittent. On Wed, Jun 8, 2011 at 9:03 AM, Jamie Johnson wrote: > Some more information > > I am currently doing the following: > > SolrQuery query = new SolrQuery(); > > query.setQuery(test"); > > query.setParam("distrib", true); > > query.setFacet(true); > > query.setParam(FacetParams.FACET_RANGE, "dateTime"); > query.setParam("f.dateTime." + FacetParams.FACET_RANGE_GAP, > "+1MONTH"); > query.setParam("f.dateTime." + FacetParams.FACET_RANGE_START, > "2011-06-01T00:00:00Z-1YEAR"); > query.setParam("f.dateTime." + FacetParams.FACET_RANGE_END, > "2011-07-01T00:00:00Z"); > query.setParam("f.dateTime." + FacetParams.FACET_MINCOUNT, "1"); > > System.out.println(query); > int failure = 0; > for(int x = 0; x < 1000; x ++){ > > QueryResponse response = mainServer.query(query); > > List ranges = response.getFacetRanges(); > for(RangeFacet range : ranges){ > if("dateTime".equals(range.getName())){ > if(range.getCounts().size() == 0){ > failure ++; > } > } > } > } > System.out.println("Failed: " + failure); > > > After this has run I get anywhere 30 - 40% failures (300 - 400). If I set > distrib to false or take off the query it works fine. Any insight would be > greatly appreciated. > > > On Tue, Jun 7, 2011 at 2:27 PM, Jamie Johnson wrote: > >> I have a solr cloud setup wtih 2 servers, when executing a query against >> them of the form: >> >> >> http://localhost:8983/solr/select/?distrib=true&q=*:*&facet=true&facet.mincount=1&facet.range=dateTime&f.dateTime.facet.range.gap=%2B1MONTH&f.dateTime.facet.range.start=2011-06-01T00%3A00%3A00Z-1YEAR&f.dateTime.facet.range.end=2011-07-01T00%3A00%3A00Z&f.dateTime.facet.mincount=1&start=0&rows=0 >> >> I am seeing that sometimes the date facet has a count, and other times it >> does not. Specifically I am seeing sometimes: >> >> >> >> >> +1MONTH >> 2010-06-01T00:00:00Z >> 2011-07-01T00:00:00Z >> >> >> >> and others >> >> >> >> 250 >> >> +1MONTH >> 2010-06-01T00:00:00Z >> 2011-07-01T00:00:00Z >> >> >> >> What could be causing this inconsistency? >> > >
Re: wildcard search
Hi Erick, I have a multivalued field "GOK" (local classification scheme) with separate entries of the sort IA 300; IC 330; IA 317; IA 318, i.e. 1 to 3 capital characters, space, 3 digits. I want to be able to perform a truncated search on that field: either just the string before the space, or a combination of that string with 1 or 2 digits, something like: GOK:IA or GOK:IA 3* or GOK:IA 31? My problem is the clash between the phrase (GOK:"IA 317" works) and the wildcards. As a start I tried as type from the solr 3.2 distribution schema (apache-solr-3.2.0/example/solr/conf/schema.xml), the field is just BTW, I have another field "DDC" with entries of the form "t1:086643" with analogous requirements which yields similar problems due to the colon, also indexed as text. Here also DDC:T1\:086643 works, but not DDC:T1\:08664? Thanks in advance Thomas > Yes there is, but you haven't provided enough information to > make a suggestion. What isthe fieldType definition? What is > the field definition? > > Two resources that'll help you greatly are: > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > > and the admin/analysis page... > > Best > Erick > > On Tue, Jun 7, 2011 at 6:23 PM, Thomas Fischer wrote: >> Hello, >> >> I am testing solr 3.2 and have problems with wildcards. >> I am indexing values like "IA 300; IC 330; IA 317; IA 318" in a field "GOK", >> and can't find a way to search with wildcards. >> I want to use a wild card search to match something like "IA 31?" but cannot >> find a way to do so. >> GOK:IA\ 38* doesn't work with the contents of GOK indexed as text. >> Is there a way to index and search that would meet my requirements? >> >> Thomas >> >> >> Mit freundlichen Grüßen Thomas Fischer
Re: huge shards (300GB each) and load balancing
Hi Upayavira, Thanks for sharing insights and experience on this. As we have 6 shards at the moment, it is pretty hard (=almost impossible) to keep them on a single box, so that's why we decided to shard. On the other hand, we have never tried multicore architecture, so that's a good point, thanks. On the indexing side, we do it rather straightforward, that is, by updating the online shards. This should hopefully be improved with [offline update / http swap] system, as already now, updating online 200GB shards at times produces OOM, freezing and other issues. Does someone have other experience / pointers to load balancer software that was tried with SOLR? Dmitry On Wed, Jun 8, 2011 at 12:32 PM, Upayavira wrote: > > > On Wed, 08 Jun 2011 10:42 +0300, "Dmitry Kan" > wrote: > > Hello list, > > > > Thanks for attending to my previous questions so far, have learnt a lot. > > Here is another one, I hope it will be interesting to answer. > > > > > > > > We run our SOLR shards and front end SOLR on the Amazon high-end > > machines. > > Currently we have 6 shards with around 200GB in each. Currently we have > > only > > one front end SOLR which, given a client query, redirects it to all the > > shards. Our shards are constantly growing, data is at times reindexed (in > > batches, which is done by removing a decent chunk before replacing it > > with > > updated data), constant stream of new data is coming every hour (usually > > hits the latest shard in time, but can also hit other shards, which have > > older data). Since the front end SOLR has started to be a SPOF, we are > > thinking about setting up some sort of load balancer. > > > > 1) do you think ELB from Amazon is a good solution for starters? We don't > > need to maintain sessions between SOLR and client. > > 2) What other load balancers have been used specifically with SOLR? > > > > > > Overall: does SOLR scale to such size (200GB in an index) and what can be > > recommended as next step -- resharding (cutting existing shards to > > smaller > > chunks), replication? > > Really, it is going to be up to you to work out what works in your > situation. You may be reaching the limit of what a Lucene index can > handle, don't know. If your query traffic is low, you might find that > two 100Gb cores in a single instance performs better. But then, maybe > not! Or two 100Gb shards on smaller Amazon hosts. But then, maybe not! > :-) > > The principal issue with Amazon's load balancers (at least when I was > using them last year) is that the ports that they balance need to be > public. You can't use an Amazon load balancer as an internal service > within a security group. For a service such as Solr, that can be a bit > of a killer. > > If they've fixed that issue, then they'd work fine (I used them quite > happily in another scenario). > > When looking at resolving single points of failure, handling search is > pretty easy (as you say, stateless load balancer). You will need to give > more attention though to how you handle it regarding indexing. > > Hope that helps a bit! > > Upayavira > > > > > > --- > Enterprise Search Consultant at Sourcesense UK, > Making Sense of Open Source > >
Re: Problem with boosting function
Show your full request to solr (all params) > Hi, > I'm trying to use bf parameter in solr queries but I'm having some problems. > The context is: I have some topics and a integer weight of popularity > (number of users that follow the topic). I'd like to boost the documents > according to this weight field, and it changes (users may start following or > unfollowing that topic). I through the best way to do that is adding a bf > parameter to the query. > First of all I was trying to include it in a query processed by a default > SearchHandler. I debugged the results and the scores didn't change. So I > tried to change the defType of the SearchHandler to dismax (I didn't add any > other field in solrconfig), and queries didn't work anymore. > What is the best way to achieve what I want? Do I really need to use a > dismax SearchHander (I read about it, and I don't want to search in multple > fields - I want to search in one field and boost in another one)? > Thanks in advance > Alex Grilo
Problem with boosting function
Hi, I'm trying to use bf parameter in solr queries but I'm having some problems. The context is: I have some topics and a integer weight of popularity (number of users that follow the topic). I'd like to boost the documents according to this weight field, and it changes (users may start following or unfollowing that topic). I through the best way to do that is adding a bf parameter to the query. First of all I was trying to include it in a query processed by a default SearchHandler. I debugged the results and the scores didn't change. So I tried to change the defType of the SearchHandler to dismax (I didn't add any other field in solrconfig), and queries didn't work anymore. What is the best way to achieve what I want? Do I really need to use a dismax SearchHander (I read about it, and I don't want to search in multple fields - I want to search in one field and boost in another one)? Thanks in advance Alex Grilo
Re: how to Index and Search non-Eglish Text in solr
This page is a handy reference for individual languages... http://wiki.apache.org/solr/LanguageAnalysis But the usual approach, especially for Chinese/Japanese/Korean (CJK) is to index the content in different fields with language-specific analyzers then spread your search across the language-specific fields (e.g. title_en, title_fr, title_ar). Stemming and stopwords particularly give "surprising" results if you put words from different languages in the same field. Best Erick On Wed, Jun 8, 2011 at 8:34 AM, Mohammad Shariq wrote: > Hi, > I had setup solr( solr-1.4 on Ubuntu 10.10) for indexing news articles in > English, but my requirement extend to index the news of other languages too. > > This is how my schema looks : > required="false"/> > > > And the "text" Field in schema.xml looks like : > > > > > words="stopwords.txt" enablePositionIncrements="true"/> > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > ignoreCase="true" expand="true"/> > words="stopwords.txt" enablePositionIncrements="true"/> > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > My Problem is : > Now I want to index the news articles in other languages to e.g. > Chinese,Japnese. > How I can I modify my text field so that I can Index the news in other lang > too and make it searchable ?? > > Thanks > Shariq > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-Index-and-Search-non-Eglish-Text-in-solr-tp3038851p3038851.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Solr Cloud and Range Facets
Some more information I am currently doing the following: SolrQuery query = new SolrQuery(); query.setQuery(test"); query.setParam("distrib", true); query.setFacet(true); query.setParam(FacetParams.FACET_RANGE, "dateTime"); query.setParam("f.dateTime." + FacetParams.FACET_RANGE_GAP, "+1MONTH"); query.setParam("f.dateTime." + FacetParams.FACET_RANGE_START, "2011-06-01T00:00:00Z-1YEAR"); query.setParam("f.dateTime." + FacetParams.FACET_RANGE_END, "2011-07-01T00:00:00Z"); query.setParam("f.dateTime." + FacetParams.FACET_MINCOUNT, "1"); System.out.println(query); int failure = 0; for(int x = 0; x < 1000; x ++){ QueryResponse response = mainServer.query(query); List ranges = response.getFacetRanges(); for(RangeFacet range : ranges){ if("dateTime".equals(range.getName())){ if(range.getCounts().size() == 0){ failure ++; } } } } System.out.println("Failed: " + failure); After this has run I get anywhere 30 - 40% failures (300 - 400). If I set distrib to false or take off the query it works fine. Any insight would be greatly appreciated. On Tue, Jun 7, 2011 at 2:27 PM, Jamie Johnson wrote: > I have a solr cloud setup wtih 2 servers, when executing a query against > them of the form: > > > http://localhost:8983/solr/select/?distrib=true&q=*:*&facet=true&facet.mincount=1&facet.range=dateTime&f.dateTime.facet.range.gap=%2B1MONTH&f.dateTime.facet.range.start=2011-06-01T00%3A00%3A00Z-1YEAR&f.dateTime.facet.range.end=2011-07-01T00%3A00%3A00Z&f.dateTime.facet.mincount=1&start=0&rows=0 > > I am seeing that sometimes the date facet has a count, and other times it > does not. Specifically I am seeing sometimes: > > > > > +1MONTH > 2010-06-01T00:00:00Z > 2011-07-01T00:00:00Z > > > > and others > > > > 250 > > +1MONTH > 2010-06-01T00:00:00Z > 2011-07-01T00:00:00Z > > > > What could be causing this inconsistency? >
Re: Getting a query on an "fl" parameter value ?
Hmmm, fl is the list of fields to return, it has nothing to do with what's searched. Are you looking for something like q=keyword AND id:(12 OR 45 OR 32)&version. Best Erick On Tue, Jun 7, 2011 at 11:14 AM, duddy67 wrote: > Hi all, > > I'd like to know if it's possible to get a query on an "fl" value. > For now my url query looks like that: > > /solr/select/?q=keyword&version=2.2&start=0&rows=10&indent=on&fl=id+name+title > > it works but I need request also on a "fl" parameter value. > I'd like to add to my initial query a kind of: WHERE the "fl" id value is > equal to 12 OR 45 OR 32. > > How can I do that ? > > > Thanks for advance. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Getting-a-query-on-an-fl-parameter-value-tp3034887p3034887.html > Sent from the Solr - User mailing list archive at Nabble.com. >
how to Index and Search non-Eglish Text in solr
Hi, I had setup solr( solr-1.4 on Ubuntu 10.10) for indexing news articles in English, but my requirement extend to index the news of other languages too. This is how my schema looks : And the "text" Field in schema.xml looks like : My Problem is : Now I want to index the news articles in other languages to e.g. Chinese,Japnese. How I can I modify my text field so that I can Index the news in other lang too and make it searchable ?? Thanks Shariq -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-Index-and-Search-non-Eglish-Text-in-solr-tp3038851p3038851.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr speed issues..
How frequently you Optimize your solrIndex ?? Optimization also helps in reducing search latency. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-speed-issues-tp2254823p3038794.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting result on query.
(11/06/08 16:20), Denis Kuzmenok wrote: If you could move to 3.x and your "linked item" boosts could be calculated offline in batch periodically you could use an external file field to store the doc boost. a few If's though I have 3.2 and external file field doesn't work without solr restart (on multicore instance). Can you try ReloadCacheRequestHandler which has been introduced 3.2? When you change external file, hit /reloadCache above instead of restarting solr. koji -- http://www.rondhuit.com/en/
Re: Re: Can I update a specific field in solr?
Solr dont support partial updates. On 8 June 2011 16:04, ZiLi wrote: > > Thanks very much , I'll re-index a whole document : ) > > > > > 发件人: Chandan Tamrakar > 发送时间: 2011-06-08 18:25:37 > 收件人: solr-user > 抄送: > 主题: Re: Can I update a specific field in solr? > > I think You can do that but you need to re-index a whole document again. > note that there is nothing like "update" , its usually delete and then > add. > thanks > On Wed, Jun 8, 2011 at 4:00 PM, ZiLi wrote: > > Hi, I try to update a specific field in solr , but I didn't find anyway > to > > implement this . > > Anyone who knows how to ? > > Any suggestions will be appriciate : ) > > > > > > 2011-06-08 > > > > > > > > ZiLi > > > -- > Chandan Tamrakar > * > * > -- Thanks and Regards Mohammad Shariq
Re: Re: Can I update a specific field in solr?
Thanks very much , I'll re-index a whole document : ) 发件人: Chandan Tamrakar 发送时间: 2011-06-08 18:25:37 收件人: solr-user 抄送: 主题: Re: Can I update a specific field in solr? I think You can do that but you need to re-index a whole document again. note that there is nothing like "update" , its usually delete and then add. thanks On Wed, Jun 8, 2011 at 4:00 PM, ZiLi wrote: > Hi, I try to update a specific field in solr , but I didn't find anyway to > implement this . > Anyone who knows how to ? > Any suggestions will be appriciate : ) > > > 2011-06-08 > > > > ZiLi > -- Chandan Tamrakar * *
Re: Can I update a specific field in solr?
I think You can do that but you need to re-index a whole document again. note that there is nothing like "update" , its usually delete and then add. thanks On Wed, Jun 8, 2011 at 4:00 PM, ZiLi wrote: > Hi, I try to update a specific field in solr , but I didn't find anyway to > implement this . > Anyone who knows how to ? > Any suggestions will be appriciate : ) > > > 2011-06-08 > > > > ZiLi > -- Chandan Tamrakar * *
AW: How to deal with many files using solr external file field
Hi, I could not provide a stack trace and IMHO it won't provide some useful information. But we've made a good progress in the analysis. We took a deeper look at what happened, when an "external-file-field"-Request is sent to SOLR: * SOLR looks if there is a file for the requested query, e.g. "trousers" * If so, then SOLR loads the "trousers"-file and generates a HashMap-Entry consisting of a FileFloatSource-Object and a FloatArray with the size of the number of documents in the SOLR-index. Every document matched by the query gains the score-value, which is provided in the external-score-file. For every(!) other document SOLR writes a zero in that FloatArray * if SOLR does not find a file for the query-Request, then SOLR still generates a HashMapEntry with score zero for every document In our case we have about 8.5 Mio. documents in our index and one of those Arrays occupies about 34MB Heap Space. Having e.g. 100 different queries and using external file field for sorting the result, SOLR occupies about 3.4GB of Heap Space. The problem might be the use of WeakHashMap [1], which prevents the Garbage Collector from cleaning up unused Keys. What do you think could be a possible solution for this whole problem? (except from "don't use external file fields" ;) Regards Sven [1]: "A hashtable-based Map implementation with weak keys. An entry in a WeakHashMap will automatically be removed when its key is no longer in ordinary use. More precisely, the presence of a mapping for a given key will not prevent the key from being discarded by the garbage collector, that is, made finalizable, finalized, and then reclaimed. When a key has been discarded its entry is effectively removed from the map, so this class behaves somewhat differently than other Map implementations." -Ursprüngliche Nachricht- Von: mtnes...@gmail.com [mailto:mtnes...@gmail.com] Im Auftrag von Simon Rosenthal Gesendet: Mittwoch, 8. Juni 2011 03:56 An: solr-user@lucene.apache.org Betreff: Re: How to deal with many files using solr external file field Can you provide a stack trace for the OOM eexception ? On Tue, Jun 7, 2011 at 4:25 PM, Bohnsack, Sven wrote: > Hi all, > > we're using solr 1.4 and external file field ([1]) for sorting our > searchresults. We have about 40.000 Terms, for which we use this sorting > option. > Currently we're running into massive OutOfMemory-Problems and were not > pretty sure, what's the matter. It seems that the garbage collector stops > working or some processes are going wild. However, solr starts to allocate > more and more RAM until we experience this OutOfMemory-Exception. > > > We noticed the following: > > For some terms one could see in the solr log that there appear some > java.io.FileNotFoundExceptions, when solr tries to load an external file for > a term for which there is not such a file, e.g. solr tries to load the > external score file for "trousers" but there ist none in the > /solr/data-Folder. > > Question: is it possible, that those exceptions are responsible for the > OutOfMemory-Problem or could it be due to the large(?) number of 40k terms > for which we want to sort the result via external file field? > > I'm looking forward for your answers, suggestions and ideas :) > > > Regards > Sven > > > [1]: > http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html >
Can I update a specific field in solr?
Hi, I try to update a specific field in solr , but I didn't find anyway to implement this . Anyone who knows how to ? Any suggestions will be appriciate : ) 2011-06-08 ZiLi
Re: Question about tokenizing, searching and retrieving results.
Hello again! Thank you very much for answering. The problem was the defaultOperator, which was setted as AND. Damn, I was blind :-/ Thank you again.
Re: huge shards (300GB each) and load balancing
On Wed, 08 Jun 2011 10:42 +0300, "Dmitry Kan" wrote: > Hello list, > > Thanks for attending to my previous questions so far, have learnt a lot. > Here is another one, I hope it will be interesting to answer. > > > > We run our SOLR shards and front end SOLR on the Amazon high-end > machines. > Currently we have 6 shards with around 200GB in each. Currently we have > only > one front end SOLR which, given a client query, redirects it to all the > shards. Our shards are constantly growing, data is at times reindexed (in > batches, which is done by removing a decent chunk before replacing it > with > updated data), constant stream of new data is coming every hour (usually > hits the latest shard in time, but can also hit other shards, which have > older data). Since the front end SOLR has started to be a SPOF, we are > thinking about setting up some sort of load balancer. > > 1) do you think ELB from Amazon is a good solution for starters? We don't > need to maintain sessions between SOLR and client. > 2) What other load balancers have been used specifically with SOLR? > > > Overall: does SOLR scale to such size (200GB in an index) and what can be > recommended as next step -- resharding (cutting existing shards to > smaller > chunks), replication? Really, it is going to be up to you to work out what works in your situation. You may be reaching the limit of what a Lucene index can handle, don't know. If your query traffic is low, you might find that two 100Gb cores in a single instance performs better. But then, maybe not! Or two 100Gb shards on smaller Amazon hosts. But then, maybe not! :-) The principal issue with Amazon's load balancers (at least when I was using them last year) is that the ports that they balance need to be public. You can't use an Amazon load balancer as an internal service within a security group. For a service such as Solr, that can be a bit of a killer. If they've fixed that issue, then they'd work fine (I used them quite happily in another scenario). When looking at resolving single points of failure, handling search is pretty easy (as you say, stateless load balancer). You will need to give more attention though to how you handle it regarding indexing. Hope that helps a bit! Upayavira --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: 400 MB Fields
Otis, Not sure about the Solr, but with Lucene It was certainly doable. I saw fields way bigger than 400Mb indexed, sometimes having a large set of unique terms as well (think something like log file with lots of alphanumeric tokens, couple of gigs in size). While indexing and querying of such things the I/O, naturally, could easily become a bottleneck. -Alexander
Re: tika integration exception and other related queries
Naveen, For indexing Zip files with Tika, take a look at the following thread : http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html I got it to work with the 3.1 source and a couple of patches. Hope this helps. Regards, Gary. On 08/06/2011 04:12, Naveen Gupta wrote: Hi Can somebody answer this ... 3. can somebody tell me an idea how to do indexing for a zip file ? 1. while sending docx, we are getting following error.
huge shards (300GB each) and load balancing
Hello list, Thanks for attending to my previous questions so far, have learnt a lot. Here is another one, I hope it will be interesting to answer. We run our SOLR shards and front end SOLR on the Amazon high-end machines. Currently we have 6 shards with around 200GB in each. Currently we have only one front end SOLR which, given a client query, redirects it to all the shards. Our shards are constantly growing, data is at times reindexed (in batches, which is done by removing a decent chunk before replacing it with updated data), constant stream of new data is coming every hour (usually hits the latest shard in time, but can also hit other shards, which have older data). Since the front end SOLR has started to be a SPOF, we are thinking about setting up some sort of load balancer. 1) do you think ELB from Amazon is a good solution for starters? We don't need to maintain sessions between SOLR and client. 2) What other load balancers have been used specifically with SOLR? Overall: does SOLR scale to such size (200GB in an index) and what can be recommended as next step -- resharding (cutting existing shards to smaller chunks), replication? Thanks for reading to this point. -- Regards, Dmitry Kan
Re: Getting query fields in a custom SearchHandler
Hi, I reply to myself :-) The solution is to use this utility class : org.apache.solr.search.QueryParsing. Then you can do: Query luceneQuery = QueryParsing.parseQuery(req.getParams().get("q"), req.getSchema()); Then with luceneQuery you can use the extractTerms method. Marc. On Fri, Jun 3, 2011 at 9:15 AM, Marc SCHNEIDER wrote: > Hi all, > > I wrote my own SearchHandler and therefore overrided the handleRequestBody > method. > This method takes two input parameters : SolrQueryRequest and > SolrQueryResponse objects. > The thing I'd like to do is to get the query fields that are used in my > request. > Of course I can use req.getParams().get("q") but it returns the complete > query (which can be very complicated). I'd like to have a simple map with > field:value. > Is there a way to get it? Or do I have to write my own parser for the "q" > parameter? > > Thanks in advance, > Marc. >
Re: Boosting result on query.
> If you could move to 3.x and your "linked item" boosts could be > calculated offline in batch periodically you could use an external > file field to store the doc boost. > a few If's though I have 3.2 and external file field doesn't work without solr restart (on multicore instance).
Re: Getting a query on an "fl" parameter value ?
try http://wiki.apache.org/solr/CommonQueryParameters#fq On 7 June 2011 16:14, duddy67 wrote: > Hi all, > > I'd like to know if it's possible to get a query on an "fl" value. > For now my url query looks like that: > > /solr/select/?q=keyword&version=2.2&start=0&rows=10&indent=on&fl=id+name+title > > it works but I need request also on a "fl" parameter value. > I'd like to add to my initial query a kind of: WHERE the "fl" id value is > equal to 12 OR 45 OR 32. > > How can I do that ? > > > Thanks for advance. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Getting-a-query-on-an-fl-parameter-value-tp3034887p3034887.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Boosting result on query.
If you could move to 3.x and your "linked item" boosts could be calculated offline in batch periodically you could use an external file field to store the doc boost. a few If's though On 8 June 2011 03:23, Jeff Boul wrote: > Hi, > > I am trying to figure out options for the following problem. I am on > Solr 1.4.1 (Lucene 2.9.1). > > I need to perform a boost on a query related to the value of a multiple > value field. > > Lets say the result return the following documents: > > id name linked_items > 3 doc3 (item1, item33, item55) > 8 doc8 (item2, item55, item8) > 0 doc0 (item7) > 1 doc1 (item1) > > > I want the result to be boosted regarding the foollowing ordered list of > linked_items values: > > item2 > item55 > item1 > ... > > So doc8 will received the higher boost because his 'linked_items' contains > 'item2' > then doc3 will received a lower boost because his 'linked_items' contains > 'item55' > then doc1 will received a much lower boost because his 'linked_items' > contains 'item1' > and maybe doc0 will received some boost if 'item7' is somewhere in the list. > > The tricky part is that the ordered list is obtained by querying on an other > index. So the result of the query on the other index will give me a result > and I will use the values of one field of those documents to construct the > ordered list. > > It would be even better if the boost not use only the order but also the > score of the result of the query on the other index. > > I'm not very used to Solr and Lucene but from what I read, I think that the > solution turns around a customization of the Query object. > > So the questions are: > > 1) Am I right with the Query's cutomization assumption? (if so... can > someone could give me advices or point me an example of something related) > 2) Is something already exist that i could use to do that? > 3) Is that a good approach to use separate index? > > Thanks for the help > > Jeff > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Boosting-result-on-query-tp3037649p3037649.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: solr 3.1 java.lang.NoClassDEfFoundError org/carrot2/core/ControllerFactory
Hi Bryan, You'll also need to make sure the your ${solr.home}/contrib/clustering/lib directory is in the classpath; that directory contains the Carrot2 JARs that provide the classes you're missing. I think the example solrconfig.xml has the relevant declarations. Cheers, S. On Tue, Jun 7, 2011 at 13:48, bryan rasmussen wrote: > As per the subject I am getting java.lang.NoClassDEfFoundError > org/carrot2/core/ControllerFactory > when I try to run clustering. > > I am using Solr 3.1: > > I get the following error: > > java.lang.NoClassDefFoundError: org/carrot2/core/ControllerFactory >at > org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.(CarrotClusteringEngine.java:74) >at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) >at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown > Source) >at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown > Source) >at java.lang.reflect.Constructor.newInstance(Unknown Source) >at java.lang.Class.newInstance0(Unknown Source) >at java.lang.Class.newInstance(Unknown Source) >at > org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:412) >at > org.apache.solr.handler.clustering.ClusteringComponent.inform(ClusteringComponent.java:203) >at > org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:522) >at org.apache.solr.core.SolrCore.(SolrCore.java:594) >at org.apache.solr.core.CoreContainer.create(CoreContainer.java:458) >at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) >at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) >at > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130) >at > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) >at > org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) >at > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >at > org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) >at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) >at > org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) >at > org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) >at > org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) >at > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >at > org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) >at > org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) >at > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >at > org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) >at > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >at > org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) >at org.mortbay.jetty.Server.doStart(Server.java:224) >at > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) >at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) >at java.lang.reflect.Method.invoke(Unknown Source) >at org.mortbay.start.Main.invokeMain(Main.java:194) >at org.mortbay.start.Main.start(Main.java:534) >at org.mortbay.start.Main.start(Main.java:441) >at org.mortbay.start.Main.main(Main.java:119) > Caused by: java.lang.ClassNotFoundException: > org.carrot2.core.ControllerFactory >at java.net.URLClassLoader$1.run(Unknown Source) >at java.security.AccessController.doPrivileged(Native Method) >at java.net.URLClassLoader.findClass(Unknown Source) >at java.lang.ClassLoader.loadClass(Unknown Source) >at java.net.FactoryURLClassLoader.loadClass(Unknown Source) > > using the following configuration > > > class="org.apache.solr.handler.clustering.ClusteringComponent" > name="clustering"> > >default > name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm > > >20 > > > class="org.apache.solr.handler.component.SearchHandler"> > > explicit > > >title >all_text >all_text title > > 150 > > >clustering > > > > > > with the following command to start solr > java -Dsolr.clustering.enabled=true > -Dsolr.solr.home="C:\projects\solrexample\solr" -jar start.jar > > Any idea as to why crusty is not working? > > Thanks, > Bryan Rasmussen >
Getting a query on an "fl" parameter value ?
Hi all, I'd like to know if it's possible to get a query on an "fl" value. For now my url query looks like that: /solr/select/?q=keyword&version=2.2&start=0&rows=10&indent=on&fl=id+name+title it works but I need request also on a "fl" parameter value. I'd like to add to my initial query a kind of: WHERE the "fl" id value is equal to 12 OR 45 OR 32. How can I do that ? Thanks for advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-a-query-on-an-fl-parameter-value-tp3034887p3034887.html Sent from the Solr - User mailing list archive at Nabble.com.