Re: Parallelizing queries without Custom Component
Thanks Emir. Looks indeed like what I need. On Mon, Jan 15, 2018 at 11:33 AM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Max, > It seems to me that you are looking for grouping > https://lucene.apache.org/solr/guide/6_6/result-grouping.html < > https://lucene.apache.org/solr/guide/6_6/result-grouping.html> or field > collapsing https://lucene.apache.org/solr/guide/6_6/collapse-and- > expand-results.html <https://lucene.apache.org/ > solr/guide/6_6/collapse-and-expand-results.html> feature. > > HTH, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 15 Jan 2018, at 17:27, Max Bridgewater <max.bridgewa...@gmail.com> > wrote: > > > > Hi, > > > > My index is composed of product reviews. Each review contains the id of > the > > product it refers to. But it also contains a rating for this product and > > the number of negative feedback provided on this product. > > > > { > > id: solr doc id, > > rating: number between 0 and 5, > > product_id: the product that is being reviewed, > > negative_feedback: how many negative feedbacks on this product > > } > > > > The query below returns the "worst" review for the given product > 7453632. > > Worst is defined as rated 1 to 3 and having the highest number of > negative > > feedback. > > > > /select?q=product_id:7453632=rating:[1 TO 3]=negative_feedback > > desc=1 > > > > The query works as intended. Now the challenging part is to extend this > > query to support many product_id. If executed with many product Id, the > > result should be the list of worst reviews for all the provided products. > > > > A query of the following form would return the list of worst products for > > products: 7453632,645454,534664. > > > > /select?q=product_id:[7453632,645454,534664]=rating:[1 TO > > 3]=negative_feedback desc > > > > Is there a way to do this in Solr without custom component? > > > > Thanks. > > Max > >
Parallelizing queries without Custom Component
Hi, My index is composed of product reviews. Each review contains the id of the product it refers to. But it also contains a rating for this product and the number of negative feedback provided on this product. { id: solr doc id, rating: number between 0 and 5, product_id: the product that is being reviewed, negative_feedback: how many negative feedbacks on this product } The query below returns the "worst" review for the given product 7453632. Worst is defined as rated 1 to 3 and having the highest number of negative feedback. /select?q=product_id:7453632=rating:[1 TO 3]=negative_feedback desc=1 The query works as intended. Now the challenging part is to extend this query to support many product_id. If executed with many product Id, the result should be the list of worst reviews for all the provided products. A query of the following form would return the list of worst products for products: 7453632,645454,534664. /select?q=product_id:[7453632,645454,534664]=rating:[1 TO 3]=negative_feedback desc Is there a way to do this in Solr without custom component? Thanks. Max
Do I need to declare TermVectorComponent for best MoreLikeThis results?
Hi, The MLT documentation says that for best results, the fields should have stored term vectors in schema.xml, with: My question: should I also create the TermVectorComponent and declare it in the search handler? In other terms, do I have to do this in my solrconfig.xml for best results? true tvComponent I am seeing continuously increasing MLT response times and I am wondering if I am doing something wrong. Thanks. Max.
MoreLikeThis Clarifications
I am trying to confirm my understanding of MLT after going through following page: https://cwiki.apache.org/confluence/display/solr/MoreLikeThis. Three approaches are mentioned: 1) Use it as a request handler and send text to the MoreLikeThis request handler as needed. 2) Use it as a search component and MLT is performed on every document returned 3) You use it as a request handler but with externally supplied text. What are example queries in each case and what config changes are required for each case? There is also MLTQParser. When can I use this parser as opposed to use any of the three above approaches? Thanks, Max.
Re: Phrase Exact Match with Margin of Error
Thanks Susheel. The challenge is that if I search for the word "between" alone, I still get plenty of results. In a way I want the query to match the document title exactly (up to a few characters) and the document title match the query exactly (up to a few characters). KeywordTokenizer allows that. But complexphrase does not seem to work with KeywordTokenizer. On Thu, Jun 15, 2017 at 10:23 AM, Susheel Kumar <susheel2...@gmail.com> wrote: > CompledPhraseQuery parser is what you need to look > https://cwiki.apache.org/confluence/display/solr/Other+ > Parsers#OtherParsers-ComplexPhraseQueryParser. > See below for e.g. > > > > http://localhost:8983/solr/techproducts/select?debugQuery=on=on= > manu:%22Bridge%20the%20gat~1%20between%20your%20skills% > 20and%20your%20goals%22=complexphrase > > On Thu, Jun 15, 2017 at 5:59 AM, Max Bridgewater < > max.bridgewa...@gmail.com> > wrote: > > > Hi, > > > > I am trying to do phrase exact match. For this, I use > > KeywordTokenizerFactory. This basically does what I want to do. My field > > type is defined as follows: > > > > > positionIncrementGap="100"> > > > > > > > > > > > > > > > > > > > > > > > > In addition to this, I want to tolerate typos of two or three letters. I > > thought fuzzy search could allow me to accept this margin of error. But > > this doesn't seem to work. > > > > A typical query I would have is: > > > > q=subjet:"Bridge the gap between your skills and your goals" > > > > Now, in this query, if I replace gap with gat, I was hoping I could do > > something such as: > > > > q=subjet:"Bridge the gat between your skills and your goals"~0.8 > > > > But this doesn't quite do what I am trying to achieve. > > > > Any suggestion? > > >
Phrase Exact Match with Margin of Error
Hi, I am trying to do phrase exact match. For this, I use KeywordTokenizerFactory. This basically does what I want to do. My field type is defined as follows: In addition to this, I want to tolerate typos of two or three letters. I thought fuzzy search could allow me to accept this margin of error. But this doesn't seem to work. A typical query I would have is: q=subjet:"Bridge the gap between your skills and your goals" Now, in this query, if I replace gap with gat, I was hoping I could do something such as: q=subjet:"Bridge the gat between your skills and your goals"~0.8 But this doesn't quite do what I am trying to achieve. Any suggestion?
Invoking a SerachHandler inside Solr Plugin
I am looking for best practices when a search component in one handler, needs to invoke another handler, say /basic. So far, I got this working prototype: public void process(ResponseBuilder rb) throws IOException { SolrQueryResponse response = new SolrQueryResponse(); ModifiableSolrParams params=new ModifiableSolrParams(); params.add("defType", "lucene").add("fl","product_id").add("wt","json"). add("df","competitor_product_titles").add("echoParams","explicit").add("q",rb.req.getParams().get("q")); SolrQueryRequest request= new LocalSolrQueryRequest(rb.req.getCore(),params ); SolrRequestHandler hdlr = rb.req.getCore().getRequestHandler("/basic"); rb.req.getCore().execute(hdlr, request, response); DocList docList=((ResultContext)response.getValues().get("response")).docs; //Do some crazy stuff with the result } My concerns: 1) What is a clean way to read the /basic handler's default parameters from solrconfig.xml and use them in LocalSolrQueryRequest(). 2) Is there a better way to accomplish this task overall? Thanks, Max.
Re: Query.extractTerms dissapeared from 5.1.0 to 5.2.0
Perfect. Thanks a lot. On Wed, Feb 1, 2017 at 2:01 PM, Alan Woodward <a...@flax.co.uk> wrote: > Hi, extractTerms() is now on Weight rather than on Query. > > Alan > > > On 1 Feb 2017, at 17:43, Max Bridgewater <max.bridgewa...@gmail.com> > wrote: > > > > Hi, > > > > It seems Query.extractTerms() disapeared from 5.1.0 ( > > http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/ > search/Query.html) > > to 5.2.0 ( > > http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/ > search/Query.html > > ). > > > > However, I cannot find any comment on it in 5.2.0 release notes. Any > > recommendation on what I should use in place of that method? I am > migrating > > some legacy code from Solr 4 to Solr 6. > > > > Thanks, > > Max. > >
Query.extractTerms dissapeared from 5.1.0 to 5.2.0
Hi, It seems Query.extractTerms() disapeared from 5.1.0 ( http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/Query.html) to 5.2.0 ( http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Query.html ). However, I cannot find any comment on it in 5.2.0 release notes. Any recommendation on what I should use in place of that method? I am migrating some legacy code from Solr 4 to Solr 6. Thanks, Max.
Solr 6 Default Core URL
I have one Solr core on my solr 6 instance and I can query it with: http://localhost:8983/solr/mycore/search?q=*:* Is there a way to configure solr 6 so that I can simply query it with this simple URL? http://localhost:8983/search?q=*:* Thanks. Max,
Re: Solr 6 Performance Suggestions
Thanks again Folks. I tried each suggestion and none made any difference. I am setting up a lab for performance monitoring using App Dynamics. Hopefully I am able to figure out something. On Mon, Nov 28, 2016 at 11:20 AM, Erick Erickson <erickerick...@gmail.com> wrote: > bq: If you know the maximum size you ever will need, setting Xmx is good. > > Not quite sure what you're getting at here. I pretty much guarantee that a > production system will eat up the default heap size, so not setting Xmx > will > cause OOM errors pretty soon. Or did you mean Xms? > > As far as setting Xms, there are differing opinions, mostly though since > Solr > likes memory so much there's a lot of tuning to try to determine Xmx and > it's pretty much guaranteed that Java will need close to that amount of > memory. > So setting Xms=Xmx is a minor optimization if that assumption is true. > It's arguable > how much practical difference it makes though. > > Best, > Erick > > On Mon, Nov 28, 2016 at 2:14 AM, Florian Gleixner <f...@redflo.de> wrote: > > Am 28.11.2016 um 00:00 schrieb Shawn Heisey: > >> > >> On 11/27/2016 12:51 PM, Florian Gleixner wrote: > >>> > >>> On 22.11.2016 14:54, Max Bridgewater wrote: > >>>> > >>>> test cases were exactly the same, the machines where exactly the same > >>>> and heap settings exactly the same (Xms24g, Xmx24g). Requests were > >>>> sent with > >>> > >>> Setting heap too large is a common error. Recent Solr use the > >>> filesystem cache, so you don't have to set heap to the size of the > >>> index. The avalible RAM has to be able to run the OS, run the jvm and > >>> hold most of the index data in filesystem cache. If you have 32GB RAM > >>> and a 20GB Index, then set -Xms never higher than 10GB. I personally > >>> would set -Xms to 4GB and omit -Xmx > >> > >> > >> In my mind, the Xmx setting is much more important than Xms. Setting > >> both to the same number avoids any need for Java to detect memory > >> pressure before increasing the heap size, which can be helpful. > >> > > > > From https://cwiki.apache.org/confluence/display/solr/JVM+Settings > > > > "The maximum heap size, set with -Xmx, is more critical. If the memory > heap > > grows to this size, object creation may begin to fail and throw > > OutOfMemoryException. Setting this limit too low can cause spurious > errors > > in your application, but setting it too high can be detrimental as well." > > > > you are right, Xmx is more important. But setting Xms to Xmx will waste > RAM, > > that the OS can use to cache your index data. Setting Xmx can avoid > problems > > in some situations where solr can eat up your filesystem cache until the > > next GC has been finished. > > > >> Without Xmx, Java is in control of the max heap size, and it may not > >> make the correct choice. It's important to know what your max heap is, > >> because chances are excellent that the max heap *will* be reached. Solr > >> allocates a lot of memory to do its job. > >> > > > > If you know the maximum size you ever will need, setting Xmx is good. > > > > > > > > >
Re: Solr 6 Performance Suggestions
Thanks folks. It looks like the sweet spot where I get comparable results is at 30 concurrent threads. It progressively degrades from there as I increases the number of concurrent threads in the test script. This made me think that something is configured in Tomcat ((Solr4) that is not comparatively set in Solr 6. The only thing I found that would make sense is the connector max number threads that we have set at 800 for Tomcat. However, it jetty.xml, maxThreads is set to 5. Not sure if these two maxThreads have the same effect. I thought about Yonik suggestion a little bit. Where I am scratching my head is that if specific kind of queries where more expensive than others, should this be reflected even at 30 concurrent threads? Anyway, still digging. On Wed, Nov 23, 2016 at 9:56 AM, Walter Underwood <wun...@wunderwood.org> wrote: > I recently ran benchmarks on 4.10.4 and 6.2.1 and found very little > difference in query performance. > > This was with 8 million documents (homework problems) from production. I > used query logs from > production. The load is a constant number of requests per minute from 100 > threads. CPU usage > is under 50% in order to avoid congestion. The benchmarks ran for 100 > minutes. > > Measuring median and 95th percentile, the times were within 10%. I think > that is within the > repeatability of the benchmark. A different number of GCs could make that > difference. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Nov 23, 2016, at 8:14 AM, Bram Van Dam <bram.van...@intix.eu> wrote: > > > > On 22/11/16 15:34, Prateek Jain J wrote: > >> I am not sure but I heard this in one of discussions, that you cant > migrate directly from solr 4 to solr 6. It has to be incremental like solr > 4 to solr 5 and then to solr 6. I might be wrong but is worth trying. > > > > Ideally the index needs to be upgraded using the IndexUpgrader. > > > > Something like this should do the trick: > > > > java -cp lucene-core-6.0.0.jar:lucene-backward-codecs-6.0.0.jar > > org.apache.lucene.index.IndexUpgrader /path/to/index > > > > - Bram > >
Solr 6 Performance Suggestions
I migrated an application from Solr 4 to Solr 6. solrconfig.xml and schema.xml are sensibly the same. The JVM params are also pretty much similar. The indicces have each about 2 million documents. No particular tuning was done to Solr 6 beyond the default settings. Solr 4 is running in Tomcat 7. Early results seem to show Solr 4 outperforming Solr 6. The first shows an average response time of 280 ms while the second averages at 430 ms. The test cases were exactly the same, the machines where exactly the same and heap settings exactly the same (Xms24g, Xmx24g). Requests were sent with Jmeter with 50 concurrent threads for 2h. I know that this is not enough information to claim that Solr 4 generally outperforms Solr 6. I also know that this pretty much depends on what the application does. So I am not claiming anything general. All I want to do is get some input before I start digging. What are some things I could tune to improve the numbers for Solr 6? Have you guys experienced such discrepancies? Thanks, Max.
Re: Edismax query parsing in Solr 4 vs Solr 6
Hi Greg, Your analysis is SPOT ON. I did some debugging and found out that we had q.op in the default set to AND. And when I changed that to OR, things worked exactly as in Solr 4. So, it seemed Solr 6 was behaving as is should. What I could not explain was whether Solr 4 was using the configured q.op that was set in the default or not. But your explanation makes sense now. Thanks, Max. On Sat, Nov 12, 2016 at 4:54 PM, Greg Pendlebury <greg.pendleb...@gmail.com> wrote: > This has come up a lot on the lists lately. Keep in mind that edismax > parses your query uses additional parameters such as 'mm' and 'q.op'. It is > the handling of these parameters (and the selection of default values) > which has changed between versions to address a few functionality gaps. > > The most common issue I've seen is where users were not setting those > values and relying on the defaults. You might now need to set them > explicitly to return to desired behaviour. > > I can't see all of your configuration, but I'm guessing the important one > here is 'q.op', which was previously hard coded to 'OR', irrespective of > either parameters or solrconfig. Try setting that to 'OR' explicitly... > maybe you have your default operator set to 'AND' in solrconfig and that is > now being applied? The other option is 'mm', which I suspect should be set > to '0' unless you have some reason to want it. If it was set to '100%' it > might insert the additional '+' flags, but it can also show up as a '~' > operator on the end. > > Ta, > Greg > > On 8 November 2016 at 22:13, Max Bridgewater <max.bridgewa...@gmail.com> > wrote: > > > I am migrating a solr based app from Solr 4 to Solr 6. One of the > > discrepancies I am noticing is around edismax query parsing. My code > makes > > the following call: > > > > > > userQuery="+(title:shirts isbn:shirts) +(id:20446 id:82876)" > > Query query=QParser.getParser(userQuery, "edismax", req).getQuery(); > > > > > > With Solr 4, query becomes: > > > > +(+(title:shirt isbn:shirts) +(id:20446 id:82876)) > > > > With Solr 6 it however becomes: > > > > +(+(+title:shirt +isbn:shirts) +(+id:20446 +id:82876)) > > > > Digging deeper, it appears that parseOriginalQuery() in > > ExtendedDismaxQParser is adding those additional + signs. > > > > > > Is there a way to prevent this altering of queries? > > > > Thanks, > > Max. > > >
Edismax query parsing in Solr 4 vs Solr 6
I am migrating a solr based app from Solr 4 to Solr 6. One of the discrepancies I am noticing is around edismax query parsing. My code makes the following call: userQuery="+(title:shirts isbn:shirts) +(id:20446 id:82876)" Query query=QParser.getParser(userQuery, "edismax", req).getQuery(); With Solr 4, query becomes: +(+(title:shirt isbn:shirts) +(id:20446 id:82876)) With Solr 6 it however becomes: +(+(+title:shirt +isbn:shirts) +(+id:20446 +id:82876)) Digging deeper, it appears that parseOriginalQuery() in ExtendedDismaxQParser is adding those additional + signs. Is there a way to prevent this altering of queries? Thanks, Max.
BooleanQuery Migration from Solr 4 to SOlr 6
HI Folks, I am tasked with migrating a Solr app from Solr 4 to Solr 6. This solr app is in essence a bunch of solr components/handlers. One part that challenges me is BooleanQuery immutability in Solr 6. Here is the challenge: In our old code base, we had classes that implemented custom interfaces and extended BooleanQuery. These custom interfaces were essentially markers that told our various components where the user came from. Based on the user's origin, different pieces of logic would apply. Now, in Solr 6, our custom boolean query can no longer extend BooleanQuery since BooleanQuery only has a private constructor. I am looking for a clean solution to this problem. Here are some ideas I had: 1) Remove the logic that depends on the custom boolean query => Big risk to our search logic 2) Simply remove BooleanQuery as super class of custom boolean query => Major risk. Wherever we do “if(query instanceof BooleanQuery) “, we would not catch our custom queries. 3) Remove BooleanQuery as parent to the custom query (e.g. make it extend Query) AND Refactor to move all “if(query instanceof BooleanQuery) “ into a dedicated method: isCustomBooleanQuery. This would return “query instanceof BooleanQuery || “query instanceof CustomQuery“. We then need to change ALL 20 occurrences of this test and ensure we handle both cases appropriately. ==> Very invasive. 4) Add a method createCustomQuery() that would return a boolean query wherein a special clause is added that allows us to identify our custom queries. This special clause should not impact search results. => Pretty ugly. Other potential clean, low risk, and less invasive solution? Max.
Determine Containing Handler
Hi, I am implementing a component that needs to redirect calls to the handler that originally called it. Say the call comes to handler /search, the component would then do some processing and, alter the query and then send the query back to /search again. It works great. The only issue is that the handler is not always called /search, leading me to have to force people to pass the handler name as parameter to the component, which is not ideal. The question thus is: is there a way to find out what handler a component was invoked from? I checked in SolrCore and SolrQueryRequest I can't seem to find a method that would do this. Thanks, Max.
Re: Function Query Parsing problem in Solr 5.4.1 and Solr 5.5.0
Thank you Mike, that was it. Max. On Sat, Apr 2, 2016 at 2:40 AM, Mikhail Khludnev <mkhlud...@griddynamics.com > wrote: > Hello Max, > > Since it reports the first space occurrence pos=32, I advise to nuke all > spaces between braces in sum(). > > On Fri, Apr 1, 2016 at 7:40 PM, Max Bridgewater <max.bridgewa...@gmail.com > > > wrote: > > > Hi, > > > > I have the following configuration for firstSearcher handler in > > solrconfig.xml: > > > > > > > > > > > > parts > > score desc, Review1 asc, Rank2 asc > > > > > > make > > {!func}sum(product(0.01,param1), > > product(0.20,param2), min(param2,0.4)) desc > > > > > > > > > > This works great in Solr 4.10. However, in solr 5.4.1 and solr 5.5.0, I > get > > the below error. How do I write this kind of query with Solr 5? > > > > > > Thanks, > > Max. > > > > > > ERROR org.apache.solr.handler.RequestHandlerBase [ x:productsearch] – > > org.apache.solr.common.SolrException: Can't determine a Sort Order (asc > or > > desc) in sort spec '{!func}sum(product(0.01,param1), > product(0.20,param2), > > min(param2,0.4)) desc', pos=32 > > at > > > > > org.apache.solr.search.SortSpecParsing.parseSortSpec(SortSpecParsing.java:143) > > at org.apache.solr.search.QParser.getSort(QParser.java:247) > > at > > > > > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:18 > > 7) > > at > > > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler > > .java:247) > > at > > > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.jav > > a:156) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073) > > at > > > > > org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:6 > > 9) > > at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1840) > > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com> >
Function Query Parsing problem in Solr 5.4.1 and Solr 5.5.0
Hi, I have the following configuration for firstSearcher handler in solrconfig.xml: parts score desc, Review1 asc, Rank2 asc make {!func}sum(product(0.01,param1), product(0.20,param2), min(param2,0.4)) desc This works great in Solr 4.10. However, in solr 5.4.1 and solr 5.5.0, I get the below error. How do I write this kind of query with Solr 5? Thanks, Max. ERROR org.apache.solr.handler.RequestHandlerBase [ x:productsearch] – org.apache.solr.common.SolrException: Can't determine a Sort Order (asc or desc) in sort spec '{!func}sum(product(0.01,param1), product(0.20,param2), min(param2,0.4)) desc', pos=32 at org.apache.solr.search.SortSpecParsing.parseSortSpec(SortSpecParsing.java:143) at org.apache.solr.search.QParser.getSort(QParser.java:247) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:18 7) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler .java:247) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.jav a:156) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:6 9) at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1840)
Re: Load Resource from within Solr Plugin
Hi Folks, Thanks for all the great suggestions. i will try and see which one works best. @Hoss: The WEB-INF folder is just in my dev environment. I have a localo Solr instance and I points it to the target/WEB-INF. Simple convenient setup for development purposes. Much appreciated. Max. On Wed, Mar 30, 2016 at 4:24 PM, Rajesh Hazari <rajeshhaz...@gmail.com> wrote: > Max, > Have you looked in External file field which is reload on every hard > commit, > only disadvantage of this is the file (personal-words.txt) has to be placed > in all data folders in each solr core, > for which we have a bash script to do this job. > > > https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes > > Ignore this if this does not meets your requirement. > > *Rajesh**.* > > On Wed, Mar 30, 2016 at 1:21 PM, Chris Hostetter <hossman_luc...@fucit.org > > > wrote: > > > : > > : > : regex=".*\.jar" /> > > > > 1) as a general rule, if you have a delcaration which includes > > "WEB-INF" you are probably doing something wrong. > > > > Maybe not in this case -- maybe "search-webapp/target" is a completley > > distinct java application and you are just re-using it's jars. But 9 > > times out of 10, when people have a WEB-INF path they are trying to load > > jars from, it's because they *first* added their jars to Solr's WEB_INF > > directory, and then when that didn't work they added the path to the > > WEB-INF dir as a ... but now you've got those classes being loaded > > twice, and you've multiplied all of your problems. > > > > 2) let's ignore the fact that your path has WEB-INF in it, and just > > assume it's some path to somewhere where on disk that has nothing to > > do with solr, and you want to load those jars. > > > > great -- solr will do that for you, and all of those classes will be > > available to plugins. > > > > Now if you wnat to explicitly do something classloader related, you do > > *not* want to be using Thread.currentThread().getContextClassLoader() ... > > because the threads that execute everything in Solr are a pool of worker > > threads that is created before solr ever has a chance to parse your > /> directive. > > > > You want to ensure anything you do related to a Classloader uses the > > ClassLoader Solr sets up for plugins -- that's available from the > > SolrResourceLoader. > > > > You can always get the SolrResourceLoader via > > SolrCore.getSolrResourceLoader(). from there you can getClassLoader() if > > you really need some hairy custom stuff -- or if you are just trying to > > load a simple resource file as an InputStream, use openResource(String > > name) ... that will start by checking for it in the conf dir, and will > > fallback to your jar -- so you can have a default resource file shipped > > with your plugin, but allow users to override it in their collection > > configs. > > > > > > -Hoss > > http://www.lucidworks.com/ > > >
Load Resource from within Solr Plugin
HI, I am facing the exact issue described here: http://stackoverflow.com/questions/25623797/solr-plugin-classloader. Basically I'm writing a solr plugin by extending SearchComponent class. My new class is part of a.jar archive. Also my class depends on a jar b.jar. I placed both jars in my own folder and declared in it solrconfig.xml with: I also declared my new component in solrconfig.xml. The component is invoked correctly up to a point where a class ClassFromB from b.jar attempts to load a classpath resource personal-words.txt from classpath. The piece of code in class ClassFromB looks like this: Thread.currentThread().getContextClassLoader().getResources("personal-words.txt") Unfortunately, this returns an empty list. Any recommendation? Thanks, Max.
5.4 facet performance thumbs-up
I'm happy to report that we are seeing significant speed-ups in our queries with Json facets on 5.4 vs regular facets on 5.1. Our queries contain mostly terms facets, many of them with exclusion tags and prefix filtering. Nice work!
RE: JSON facets and excluded queries
Good to know, thank you. From an implementation standpoint that makes a lot of sense. We are only using facets of type 'term' for now and for those it works nicely. Our usual searches carry around 8-12 facets so we are covered from that side :-) -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, December 11, 2015 3:12 PM To: solr-user <solr-user@lucene.apache.org> Subject: Re: JSON facets and excluded queries Do note that the number of threads also won't help much last I knew unless you are faceting over that many fields too. I.e. setting this to 5 while faceting on only 1 field won't help. And it's not implemented for all facet types IIRC. Best, Erick On Fri, Dec 11, 2015 at 1:07 PM, Aigner, Max <max.aig...@nordstrom.com> wrote: > Answering one question myself after doing some testing on 5.3.1: > > Yes, facet.threads is still relevant with Json facets. > > We are seeing significant gains as we are increasing the number of threads > from 1 up to 4. Beyond that we only observed marginal improvements -- which > makes sense because the test VM has 4 cores. > > -Original Message- > From: Aigner, Max [mailto:max.aig...@nordstrom.com] > Sent: Thursday, December 10, 2015 12:33 PM > To: solr-user@lucene.apache.org > Subject: RE: JSON facets and excluded queries > > Another question popped up around this: > Is the facet.threads parameter still relevant with Json facets? I saw that > the facet prefix bug https://issues.apache.org/jira/browse/SOLR-6686 got > fixed in 5.3 so I'm looking into re-enabling this parameter for our searches. > > On a side note, I've been testing Json facet performance and I've observed > that they're generally faster unless facet prefix filtering comes into play, > then they seem to be slower than standard facets. > Is that just a fluke or should I switch to Json Query Facets instead of using > facet prefix filtering? > > Thanks again, > Max > > -Original Message- > From: Aigner, Max [mailto:max.aig...@nordstrom.com] > Sent: Wednesday, November 25, 2015 11:54 AM > To: solr-user@lucene.apache.org > Subject: RE: JSON facets and excluded queries > > Yes, just tried that and it works fine. > > That just removed a showstopper for me as my queries contain lots of tagged > FQs and multi-select facets (implemented the 'good way' :). > > Thank you for the quick help! > > -Original Message- > From: Yonik Seeley [mailto:ysee...@gmail.com] > Sent: Wednesday, November 25, 2015 11:38 AM > To: solr-user@lucene.apache.org > Subject: Re: JSON facets and excluded queries > > On Wed, Nov 25, 2015 at 2:29 PM, Yonik Seeley <ysee...@gmail.com> wrote: >> On Wed, Nov 25, 2015 at 2:15 PM, Aigner, Max <max.aig...@nordstrom.com> >> wrote: >>> Thanks, this is great :=)) >>> >>> I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem >>> to be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. >>> Did I get that right? >> >> Hmmm, the "domain" keyword was added for 5.3 along with block join >> faceting: http://yonik.com/solr-nested-objects/ >> That's when I switched "excludeTags" to also be under the "domain" keyword. >> >> Let me try it out... > > Ah, I messed up that migration... > OK, for now, instead of > domain:{excludeTags:foo} > just use > excludeTags:foo > and it should work. > > -Yonik
RE: JSON facets and excluded queries
Answering one question myself after doing some testing on 5.3.1: Yes, facet.threads is still relevant with Json facets. We are seeing significant gains as we are increasing the number of threads from 1 up to 4. Beyond that we only observed marginal improvements -- which makes sense because the test VM has 4 cores. -Original Message- From: Aigner, Max [mailto:max.aig...@nordstrom.com] Sent: Thursday, December 10, 2015 12:33 PM To: solr-user@lucene.apache.org Subject: RE: JSON facets and excluded queries Another question popped up around this: Is the facet.threads parameter still relevant with Json facets? I saw that the facet prefix bug https://issues.apache.org/jira/browse/SOLR-6686 got fixed in 5.3 so I'm looking into re-enabling this parameter for our searches. On a side note, I've been testing Json facet performance and I've observed that they're generally faster unless facet prefix filtering comes into play, then they seem to be slower than standard facets. Is that just a fluke or should I switch to Json Query Facets instead of using facet prefix filtering? Thanks again, Max -Original Message- From: Aigner, Max [mailto:max.aig...@nordstrom.com] Sent: Wednesday, November 25, 2015 11:54 AM To: solr-user@lucene.apache.org Subject: RE: JSON facets and excluded queries Yes, just tried that and it works fine. That just removed a showstopper for me as my queries contain lots of tagged FQs and multi-select facets (implemented the 'good way' :). Thank you for the quick help! -Original Message- From: Yonik Seeley [mailto:ysee...@gmail.com] Sent: Wednesday, November 25, 2015 11:38 AM To: solr-user@lucene.apache.org Subject: Re: JSON facets and excluded queries On Wed, Nov 25, 2015 at 2:29 PM, Yonik Seeley <ysee...@gmail.com> wrote: > On Wed, Nov 25, 2015 at 2:15 PM, Aigner, Max <max.aig...@nordstrom.com> wrote: >> Thanks, this is great :=)) >> >> I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem to >> be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. Did I >> get that right? > > Hmmm, the "domain" keyword was added for 5.3 along with block join > faceting: http://yonik.com/solr-nested-objects/ > That's when I switched "excludeTags" to also be under the "domain" keyword. > > Let me try it out... Ah, I messed up that migration... OK, for now, instead of domain:{excludeTags:foo} just use excludeTags:foo and it should work. -Yonik
RE: JSON facets and excluded queries
Another question popped up around this: Is the facet.threads parameter still relevant with Json facets? I saw that the facet prefix bug https://issues.apache.org/jira/browse/SOLR-6686 got fixed in 5.3 so I'm looking into re-enabling this parameter for our searches. On a side note, I've been testing Json facet performance and I've observed that they're generally faster unless facet prefix filtering comes into play, then they seem to be slower than standard facets. Is that just a fluke or should I switch to Json Query Facets instead of using facet prefix filtering? Thanks again, Max -Original Message- From: Aigner, Max [mailto:max.aig...@nordstrom.com] Sent: Wednesday, November 25, 2015 11:54 AM To: solr-user@lucene.apache.org Subject: RE: JSON facets and excluded queries Yes, just tried that and it works fine. That just removed a showstopper for me as my queries contain lots of tagged FQs and multi-select facets (implemented the 'good way' :). Thank you for the quick help! -Original Message- From: Yonik Seeley [mailto:ysee...@gmail.com] Sent: Wednesday, November 25, 2015 11:38 AM To: solr-user@lucene.apache.org Subject: Re: JSON facets and excluded queries On Wed, Nov 25, 2015 at 2:29 PM, Yonik Seeley <ysee...@gmail.com> wrote: > On Wed, Nov 25, 2015 at 2:15 PM, Aigner, Max <max.aig...@nordstrom.com> wrote: >> Thanks, this is great :=)) >> >> I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem to >> be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. Did I >> get that right? > > Hmmm, the "domain" keyword was added for 5.3 along with block join > faceting: http://yonik.com/solr-nested-objects/ > That's when I switched "excludeTags" to also be under the "domain" keyword. > > Let me try it out... Ah, I messed up that migration... OK, for now, instead of domain:{excludeTags:foo} just use excludeTags:foo and it should work. -Yonik
RE: JSON facets and excluded queries
Yes, just tried that and it works fine. That just removed a showstopper for me as my queries contain lots of tagged FQs and multi-select facets (implemented the 'good way' :). Thank you for the quick help! -Original Message- From: Yonik Seeley [mailto:ysee...@gmail.com] Sent: Wednesday, November 25, 2015 11:38 AM To: solr-user@lucene.apache.org Subject: Re: JSON facets and excluded queries On Wed, Nov 25, 2015 at 2:29 PM, Yonik Seeley <ysee...@gmail.com> wrote: > On Wed, Nov 25, 2015 at 2:15 PM, Aigner, Max <max.aig...@nordstrom.com> wrote: >> Thanks, this is great :=)) >> >> I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem to >> be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. Did I >> get that right? > > Hmmm, the "domain" keyword was added for 5.3 along with block join > faceting: http://yonik.com/solr-nested-objects/ > That's when I switched "excludeTags" to also be under the "domain" keyword. > > Let me try it out... Ah, I messed up that migration... OK, for now, instead of domain:{excludeTags:foo} just use excludeTags:foo and it should work. -Yonik
RE: JSON facets and excluded queries
Thanks, this is great :=)) I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem to be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. Did I get that right? Thanks, Max -Original Message- From: Yonik Seeley [mailto:ysee...@gmail.com] Sent: Wednesday, November 25, 2015 9:21 AM To: solr-user@lucene.apache.org Subject: Re: JSON facets and excluded queries Here's a little tutorial on multi-select faceting w/ the JSON Facet API: http://yonik.com/multi-select-faceting/ -Yonik On Tue, Nov 24, 2015 at 12:56 PM, Aigner, Max <max.aig...@nordstrom.com> wrote: > I'm currently evaluating Solr 5.3.1 for performance improvements with > faceting. > However, I'm unable to get the 'exclude-tagged-filters' feature to work. A > lot of the queries I'm doing are in the format > > ...?q=category:123={!tag=fqCol}color:green=true{!key=price_all > ex=fqCol}price{!key=price_nogreen}price... > > I couldn't find a way to make this work with JSON facets, the 'ex=' local > param doesn't seem to have a corresponding new parameter in JSON facets. > Am I just missing something or is there a new recommended way for calculating > facets over a subset of filters? > > Thanks! >
JSON facets and excluded queries
I'm currently evaluating Solr 5.3.1 for performance improvements with faceting. However, I'm unable to get the 'exclude-tagged-filters' feature to work. A lot of the queries I'm doing are in the format ...?q=category:123={!tag=fqCol}color:green=true{!key=price_all ex=fqCol}price{!key=price_nogreen}price... I couldn't find a way to make this work with JSON facets, the 'ex=' local param doesn't seem to have a corresponding new parameter in JSON facets. Am I just missing something or is there a new recommended way for calculating facets over a subset of filters? Thanks!
Spellcheck / Suggestions : Append custom dictionary to SOLR default index
Is there a way to append a set of words the the out-of-box solr index when using the spellcheck / suggestions feature?
Solr could replace shards
I am considering using SolrCloud, but I have a use case that I am not sure if it covers. I would like to keep an index up to date in realtime, but also I would like to sometimes restate the past. The way that I would restate the past is to do batch processing over historical data. My idea is that I would have the Solr collection sharded by date range. As I move forward in time I would add more shards. For restating historical data I would have a separate process that actually indexes a shards worth of data. (This keeps the servers that are meant for production search from having to handle the load of indexing historically.) I would then move the index files to the solr servers and register the newly created index with the server replacing the existing shards. I used to be able to do something similar pre-SolrCloud by using the core admin. But this did not have the benefit of having one search for the entire collection. I had to manually query each of the cores to get the full search index. Essentially the question is: 1- is it possible to shard by date range in this way? 2- is it possible to swap out the index used by a shard? 3- is there a different way I should be thinking of this? Max
Re: Indexed data not searchable
Thanks alot, so I will make a XSLT. Great community here! -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4055258.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexed data not searchable
Thanks to this! No I have another problem. I tried to give the XML file the right format so I made this ?xml version=1.0 encoding=UTF-8? adddoc field name=id455HHS-2232/field field name=titleT0072-00031-DOWNLOAD - Blatt 12v/field field name=formatapplication/pdf/field field name=created2012-11-07T11:15:19.887+01:00/field field name=lastModified2012-11-07T11:15:19.887+01:00/field field name=issued2012-11-07T11:15:19.887+01:00/field field name=revision0/field field name=pidhdl:11858/00-1734--0008-12C5-2/field field name=extent1131033/field field name=projectSt. Matthias Test 07/field field name=availabilitypublic/field field name=rightsHolderStadtbibliothek und Stadtarchiv Trier/field /doc/add I also made the changes in the schema.xml I added this fields: field name=identifier type=text_general indexed=true stored=true/ field name=format type=text_general indexed=true stored=true/ field name=created type=date indexed=true stored=true/ field name=issued type=date indexed=true stored=true/ field name=revision type=int indexed=true stored=true/ field name=pid type=text_general indexed=true stored=true/ field name=extent type=int indexed=true stored=true/ field name=dataContributor type=text_general indexed=true stored=true/ field name=project type=text_general indexed=true stored=true/ field name=availability type=text_general indexed=true stored=true/ field name=rightsholder type=text_general indexed=true stored=true/ Did I made anything wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4054960.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexed data not searchable
Just for information: I indicate that the problem occurs when I try to add the fields, created, last_modified, issued (all three have the type date) and the field rightsholder. Maybe it is helpful! -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4054977.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexed data not searchable
Thank you. I changed it and now it works. But is there any possibility to make the given timestamp acceptable for solr? -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4054985.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexed data not searchable
The XML files are formatted like this. I think there is the problem. metadataContainerType ns3:object ns3:generic ns3:provided ns3:titleT0084-00371-DOWNLOAD - Blatt 184r/ns3:title ns3:identifier type=METSXMLIDT0084-00371-DOWNLOAD/ns3:identifier ns3:formatapplication/pdf/ns3:format /ns3:provided ns3:generated ns3:created2012-11-08T00:09:57.531+01:00/ns3:created ns3:lastModified2012-11-08T00:09:57.531+01:00/ns3:lastModified ns3:issued2012-11-08T00:09:57.531+01:00/ns3:issued .. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4054651.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexed data not searchable
Hello, I'm very new to Solr and I come to an unexplainable point by myself so I need your help. I have indexed a huge amount of xml-Files by a shell script. function solringest_rec { for SRCFILE in $(find $1 -type f); do #DESTFILE=$URL${SRCFILE/$1/} echo ingest $SRCFILE curl $URL -H Content-type: text/xml --data-binary @$SRCFILE done } The respone I get is everytime: ?xml version=1.0! encoding=UTF-8? response lst name=responseHeaderint name=status0int name=QTime116/int/lst /respone Because of this I think that everything should be fine but the queries doesn't work. For all other operations as the post operation I use the stuff from example folder. Maybe I have to configure something in the schema.xml or solrconfig.xml ? Hope you can help me! Kind regards, Max -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexed data not searchable
Thanks for your help: The URL I'am positng to is: http://localhost:8983/solr/update?commit=true The XML-Filess I've added contains fields like author so I thought they have to serachable since it it declared as indexed in the example schema. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4054481.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How can I check if a more complex query condition matched?
Thanks for your reply, I thought about using the debug mode, too, but the information is not easy to parse and doesnt contain everything I want. Furthermore I dont want to enable debug mode in production. Is there anything else I could try? On Tue, Dec 27, 2011 at 12:48 PM, Ahmet Arslan iori...@yahoo.com wrote: I have a more complex query condition like this: (city:15 AND country:60)^4 OR city:15^2 OR country:60^2 What I want to achive with this query is basically if a document has city = 15 AND country = 60 it is more important then another document which only has city = 15 OR country = 60 Furhtermore I want to show in my results view why a certain document matched, something like matched city and country or matched city only or matched country only. This is a bit of an simplified example, but the question remains: how can solr tell me which of the conditions in the query matched? If I match against a simple field only, I can get away with highlight fields, but conditions spanning multiple fields seem much more tricky. Looks like you can extract these info from output of debugQuery=on. http://wiki.apache.org/solr/CommonQueryParameters#debugQuery
How can I check if a more complex query condition matched?
I have a more complex query condition like this: (city:15 AND country:60)^4 OR city:15^2 OR country:60^2 What I want to achive with this query is basically if a document has city = 15 AND country = 60 it is more important then another document which only has city = 15 OR country = 60 Furhtermore I want to show in my results view why a certain document matched, something like matched city and country or matched city only or matched country only. This is a bit of an simplified example, but the question remains: how can solr tell me which of the conditions in the query matched? If I match against a simple field only, I can get away with highlight fields, but conditions spanning multiple fields seem much more tricky. Thanks for any ideas on this!
InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams
Hi there, when highlighting a field with this definition: fieldType name=name class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ICUTransformFilterFactory id=Any-Latin/ filter class=solr.ICUFoldingFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ICUTransformFilterFactory id=Any-Latin/ filter class=solr.ICUFoldingFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front/ /analyzer /fieldType containing this string: Mosfellsbær I get the following exception, if that field is in the highlight fields: SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token mosfellsbaer exceeds length of provided text sized 11 at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:497) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:636) Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token mosfellsbaer exceeds length of provided text sized 11 at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:490) I tried with solr 3.4 and 3.5, same error for both. Removing the char filter didnt fix the problem either. It seems like there is some weird stuff going on when folding the string, it can be seen in the analysis view, too: http://i.imgur.com/6B2Uh.png The end offset remains 11 even after folding and transforming æ to ae, which seems wrong to me. I also stumbled upon https://issues.apache.org/jira/browse/LUCENE-1500 which seems like a similiar issue. Is there a workaround for that problem or is the field configuration wrong?
Re: InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams
Robert, thank you for creating the issue in JIRA. However, I need ngrams on that field – is there an alternative to the EdgeNGramFilterFactory ? Thanks! On Mon, Dec 12, 2011 at 1:25 PM, Robert Muir rcm...@gmail.com wrote: On Mon, Dec 12, 2011 at 5:18 AM, Max nas...@gmail.com wrote: It seems like there is some weird stuff going on when folding the string, it can be seen in the analysis view, too: http://i.imgur.com/6B2Uh.png I created a bug here, https://issues.apache.org/jira/browse/LUCENE-3642 Thanks for the screenshot, makes it easy to do a test case here. -- lucidimagination.com
Re: Using Solr Analyzers in Lucene
I guess I missed the init() method. I was looking at the factory and thought I saw config loading stuff (like getInt) which I assumed meant it need to have schema.xml available. Thanks! -Max On Tue, Oct 5, 2010 at 2:36 PM, Mathias Walter mathias.wal...@gmx.netwrote: Hi Max, why don't you use WordDelimiterFilterFactory directly? I'm doing the same stuff inside my own analyzer: final MapString, String args = new HashMapString, String(); args.put(generateWordParts, 1); args.put(generateNumberParts, 1); args.put(catenateWords, 0); args.put(catenateNumbers, 0); args.put(catenateAll, 0); args.put(splitOnCaseChange, 1); args.put(splitOnNumerics, 1); args.put(preserveOriginal, 1); args.put(stemEnglishPossessive, 0); args.put(language, English); wordDelimiter = new WordDelimiterFilterFactory(); wordDelimiter.init(args); stream = wordDelimiter.create(stream); -- Kind regards, Mathias -Original Message- From: Max Lynch [mailto:ihas...@gmail.com] Sent: Tuesday, October 05, 2010 1:03 AM To: solr-user@lucene.apache.org Subject: Re: Using Solr Analyzers in Lucene I have made progress on this by writing my own Analyzer. I basically added the TokenFilters that are under each of the solr factory classes. I had to copy and paste the WordDelimiterFilter because, of course, it was package protected. On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch ihas...@gmail.com wrote: Hi, I asked this question a month ago on lucene-user and was referred here. I have content being analyzed in Solr using these tokenizers and filters: fieldType name=text_standard class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType Basically I want to be able to search against this index in Lucene with one of my background searching applications. My main reason for using Lucene over Solr for this is that I use the highlighter to keep track of exactly which terms were found which I use for my own scoring system and I always collect the whole set of found documents. I've messed around with using Boosts but it wasn't fine grained enough and I wasn't able to effectively create a score threshold (would creating my own scorer be a better idea?) Is it possible to use this analyzer from Lucene, or at least re-create it in code? Thanks.
Using Solr Analyzers in Lucene
Hi, I asked this question a month ago on lucene-user and was referred here. I have content being analyzed in Solr using these tokenizers and filters: fieldType name=text_standard class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType Basically I want to be able to search against this index in Lucene with one of my background searching applications. My main reason for using Lucene over Solr for this is that I use the highlighter to keep track of exactly which terms were found which I use for my own scoring system and I always collect the whole set of found documents. I've messed around with using Boosts but it wasn't fine grained enough and I wasn't able to effectively create a score threshold (would creating my own scorer be a better idea?) Is it possible to use this analyzer from Lucene, or at least re-create it in code? Thanks.
Re: Using Solr Analyzers in Lucene
I have made progress on this by writing my own Analyzer. I basically added the TokenFilters that are under each of the solr factory classes. I had to copy and paste the WordDelimiterFilter because, of course, it was package protected. On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch ihas...@gmail.com wrote: Hi, I asked this question a month ago on lucene-user and was referred here. I have content being analyzed in Solr using these tokenizers and filters: fieldType name=text_standard class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType Basically I want to be able to search against this index in Lucene with one of my background searching applications. My main reason for using Lucene over Solr for this is that I use the highlighter to keep track of exactly which terms were found which I use for my own scoring system and I always collect the whole set of found documents. I've messed around with using Boosts but it wasn't fine grained enough and I wasn't able to effectively create a score threshold (would creating my own scorer be a better idea?) Is it possible to use this analyzer from Lucene, or at least re-create it in code? Thanks.
Search a URL
Is there a tokenizer that will allow me to search for parts of a URL? For example, the search google would match on the data http://mail.google.com/dlkjadf; This tokenizer factory doesn't seem to be sufficient: fieldType name=text_standard class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType Thanks.
Re: Updating document without removing fields
Thanks Lance. I have decided to just put all of my processing on a bigger server along with solr. It's too bad, but I can manage. -Max On Sun, Aug 29, 2010 at 9:59 PM, Lance Norskog goks...@gmail.com wrote: No. Document creation is all-or-nothing, fields are not updateable. I think you have to filter all of your field changes through a join server. That is, all field updates could go to a database and the master would read document updates from that database. Or, you could have one updater feed updates to the other, The sends all updates to the master. Lance On Sun, Aug 29, 2010 at 6:19 PM, Max Lynch ihas...@gmail.com wrote: Hi, I have a master solr server and two slaves. On each of the slaves I have programs running that read the slave index, do some processing on each document, add a few new fields, and commit the changes back to the master. The problem I'm running into right now is one slave will update one document and the other slave will eventually update the same document, but the changes will overwrite each other. For example, one slave will add a field and commit the document, but the other slave won't have that field yet so it won't duplicate the document when it updates the doc with its own new field. This causes the document to miss one set of fields from one of the slaves. Can I update a document without having to recreate it? Is there a way to update the slave and then have the slave commit the changes to the master (adding new fields in the process?) Thanks. -- Lance Norskog goks...@gmail.com
Updating document without removing fields
Hi, I have a master solr server and two slaves. On each of the slaves I have programs running that read the slave index, do some processing on each document, add a few new fields, and commit the changes back to the master. The problem I'm running into right now is one slave will update one document and the other slave will eventually update the same document, but the changes will overwrite each other. For example, one slave will add a field and commit the document, but the other slave won't have that field yet so it won't duplicate the document when it updates the doc with its own new field. This causes the document to miss one set of fields from one of the slaves. Can I update a document without having to recreate it? Is there a way to update the slave and then have the slave commit the changes to the master (adding new fields in the process?) Thanks.
Delete by query issue
Hi, I am trying to delete all documents that have null values for a certain field. To that effect I can see all of the documents I want to delete by doing this query: -date_added_solr:[* TO *] This returns about 32,000 documents. However, when I try to put that into a curl call, no documents get deleted: curl http://localhost:8985/solr/newsblog/update?commit=true -H Content-Type: text/xml --data-binary 'deletequery-date_added_solr:[* TO *]/query/delete' Solr responds with: response lst name=responseHeaderint name=status0/intint name=QTime364/int/lst /response But nothing happens, even if I explicitly issue a commit afterward. Any ideas? Thanks.
Re: Delete by query issue
I was trying to filter out all documents that HAVE that field. I was trying to delete any documents where that field had empty values. I just found a way to do it, but I did a range query on a string date in the Lucene DateTools format and it worked, so I'm satisfied. However, I believe it worked because all of my documents have values for that field. Oh well. -max On Wed, Aug 25, 2010 at 9:45 PM, scott chu (朱炎詹) scott@udngroup.comwrote: Excuse me, what's the hyphen before the field name 'date_added_solr'? Is this some kind of new query format that I didn't know? deletequery-date_added_solr:[* TO *]/query/delete' - Original Message - From: Max Lynch ihas...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, August 26, 2010 6:12 AM Subject: Delete by query issue Hi, I am trying to delete all documents that have null values for a certain field. To that effect I can see all of the documents I want to delete by doing this query: -date_added_solr:[* TO *] This returns about 32,000 documents. However, when I try to put that into a curl call, no documents get deleted: curl http://localhost:8985/solr/newsblog/update?commit=true -H Content-Type: text/xml --data-binary 'deletequery-date_added_solr:[* TO *]/query/delete' Solr responds with: response lst name=responseHeaderint name=status0/intint name=QTime364/int/lst /response But nothing happens, even if I explicitly issue a commit afterward. Any ideas? Thanks. %b6G$J0T.'$$'d(l/f,r!C Checked by AVG - www.avg.com Version: 9.0.851 / Virus Database: 271.1.1/3093 - Release Date: 08/25/10 14:34:00
Duplicating a Solr Doc
Right now I am doing some processing on my Solr index using Lucene Java. Basically, I loop through the index in Java and do some extra processing of each document (processing that is too intensive to do during indexing). However, when I try to update the document in solr with new fields (using SolrJ), the document either loses fields I don't explicitly set, or if I have Solr-specific fields such as a solr date field type, I am not able to copy the value as I can't read the value from Java. Is there a way to add a field to a solr document without having to re-create the document? If not, how can I read the value of a Solr date in java? Document.get(date_field) returns null even though the value shows up when I access it through solr. If I could read this value I could just copy the fields from the Lucene Document to a SolrInputDocument. Thanks.
Re: Delete by query issue
Thanks Lance. I'll give that a try going forward. On Wed, Aug 25, 2010 at 9:59 PM, Lance Norskog goks...@gmail.com wrote: Here's the problem: the standard Solr parser is a little weird about negative queries. The way to make this work is to say *:* AND -field:[* TO *] This means select everything AND only these documents without a value in the field. On Wed, Aug 25, 2010 at 7:55 PM, Max Lynch ihas...@gmail.com wrote: I was trying to filter out all documents that HAVE that field. I was trying to delete any documents where that field had empty values. I just found a way to do it, but I did a range query on a string date in the Lucene DateTools format and it worked, so I'm satisfied. However, I believe it worked because all of my documents have values for that field. Oh well. -max On Wed, Aug 25, 2010 at 9:45 PM, scott chu (朱炎詹) scott@udngroup.com wrote: Excuse me, what's the hyphen before the field name 'date_added_solr'? Is this some kind of new query format that I didn't know? deletequery-date_added_solr:[* TO *]/query/delete' - Original Message - From: Max Lynch ihas...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, August 26, 2010 6:12 AM Subject: Delete by query issue Hi, I am trying to delete all documents that have null values for a certain field. To that effect I can see all of the documents I want to delete by doing this query: -date_added_solr:[* TO *] This returns about 32,000 documents. However, when I try to put that into a curl call, no documents get deleted: curl http://localhost:8985/solr/newsblog/update?commit=true -H Content-Type: text/xml --data-binary 'deletequery-date_added_solr:[* TO *]/query/delete' Solr responds with: response lst name=responseHeaderint name=status0/intint name=QTime364/int/lst /response But nothing happens, even if I explicitly issue a commit afterward. Any ideas? Thanks. %b6G$J0T.'$$'d(l/f,r!C Checked by AVG - www.avg.com Version: 9.0.851 / Virus Database: 271.1.1/3093 - Release Date: 08/25/10 14:34:00 -- Lance Norskog goks...@gmail.com
Re: Duplicating a Solr Doc
It seems like this is a way to accomplish what I was looking for: CoreContainer coreContainer = new CoreContainer(); File home = new File(/home/max/packages/test/apache-solr-1.4.1/example/solr); File f = new File(home, solr.xml); coreContainer.load(/home/max/packages/test/apache-solr-1.4.1/example/solr, f); SolrCore core = coreContainer.getCore(newsblog); IndexSchema schema = core.getSchema(); DocumentBuilder builder = new DocumentBuilder(schema); // get a Lucene Doc // Document d = ... SolrDocument solrDocument = new SolrDocument(); builder.loadStoredFields(solrDocument, d); logger.debug(Loaded stored date: + solrDocument.getFieldValue(date_added_solr)); However, one thing that scares me is the warning message I get from the CoreContainer: [java] Aug 25, 2010 10:25:23 PM org.apache.solr.update.SolrIndexWriter finalize [java] SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! I'm not sure what exactly triggers that but it's a result of the code I posted above. On Wed, Aug 25, 2010 at 10:49 PM, Max Lynch ihas...@gmail.com wrote: Right now I am doing some processing on my Solr index using Lucene Java. Basically, I loop through the index in Java and do some extra processing of each document (processing that is too intensive to do during indexing). However, when I try to update the document in solr with new fields (using SolrJ), the document either loses fields I don't explicitly set, or if I have Solr-specific fields such as a solr date field type, I am not able to copy the value as I can't read the value from Java. Is there a way to add a field to a solr document without having to re-create the document? If not, how can I read the value of a Solr date in java? Document.get(date_field) returns null even though the value shows up when I access it through solr. If I could read this value I could just copy the fields from the Lucene Document to a SolrInputDocument. Thanks.
Duplicate a core
Is it possible to duplicate a core? I want to have one core contain only documents within a certain date range (ex: 3 days old), and one core with all documents that have ever been in the first core. The small core is then replicated to other servers which do real-time processing on it, but the archive core exists for longer term searching. I understand I could just connect to both cores from my indexer, but I would like to not have to send duplicate documents across the network to save bandwidth. Is this possible? Thanks.
Re: Duplicate a core
What I'm doing now is just adding the documents to the other core each night and deleting old documents from the other core when I'm finished. Is there a better way? On Tue, Aug 3, 2010 at 4:38 PM, Max Lynch ihas...@gmail.com wrote: Is it possible to duplicate a core? I want to have one core contain only documents within a certain date range (ex: 3 days old), and one core with all documents that have ever been in the first core. The small core is then replicated to other servers which do real-time processing on it, but the archive core exists for longer term searching. I understand I could just connect to both cores from my indexer, but I would like to not have to send duplicate documents across the network to save bandwidth. Is this possible? Thanks.
Re: Know which terms are in a document
Yea, I've had mild success with the highlighting approach with lucene, but wasn't sure if there was another method available from solr. Thanks Mike. On Thu, Jul 29, 2010 at 5:17 AM, Michael McCandless luc...@mikemccandless.com wrote: This is a fairly frequently requested and missing feature in Lucene/Solr... Lucene actually knows this information while it's scoring each document; it's just that it in no way tries to record that. If you will only do this on a few documents (eg the one page of results) then piggybacking on the highlighter is an OK approach. If you need it on more docs than that, then probably you should customize how your queries are scored to also tally up which docs had which terms. Mike On Wed, Jul 28, 2010 at 6:53 PM, Max Lynch ihas...@gmail.com wrote: I would like to be search against my index, and then *know* which of a set of given terms were found in each document. For example, let's say I want to show articles with the word pizza or cake in them, but would like to be able to say which of those two was found. I might use this to handle the article differently if it is about pizza, or if it is about cake. I understand I can do multiple queries but I would like to avoid that. One thought I had was to use a highlighter and only return a fragment with the highlighted word, but I'm not sure how to do this with the various highlighting options. Is there a way? Thanks.
Know which terms are in a document
I would like to be search against my index, and then *know* which of a set of given terms were found in each document. For example, let's say I want to show articles with the word pizza or cake in them, but would like to be able to say which of those two was found. I might use this to handle the article differently if it is about pizza, or if it is about cake. I understand I can do multiple queries but I would like to avoid that. One thought I had was to use a highlighter and only return a fragment with the highlighted word, but I'm not sure how to do this with the various highlighting options. Is there a way? Thanks.
Re: CommonsHttpSolrServer add document hangs
I'm still having trouble with this. My program will run for a while, then hang up at the same place. Here is my add/commit process: I am using StreamingUpdateSolrServer with queue size = 100 and num threads = 3. My indexing process spawns 8 threads to process a subset of RSS feeds which each thread then loops through. Once a thread has processed a new article, it constructs a new SolrInputDocument, creates a temporary CollectionSolrInputDocument containing just the one new document, then calls server.add(docs). I never call commit() or optimize() from my java code (I did before though, but I took that out). On the server side, I have these related settings: updateHandler class=solr.DirectUpdateHandler2 autoCommit maxDocs300/maxDocs maxTime1/maxTime /autoCommit /updateHandler I also have replication set up, as this is the master, here are the settings: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFilesschema.xml,stopwords.txt/str /lst /requestHandler Those are the only extra settings I've set. I also have a cron job running every minute executing this command: curl http://localhost:8985/solr/mycore/update -F stream.body=' commit /' Otherwise I don't see the numDocs number increase on the admin statistics page. This process will soon be ONLY for indexing. Is there a better way to optimize it? I replicate from the slaves every 60 seconds, and I want documents to be available to the slaves as soon as possible. Currently I have a search process that has some IndexSearcher's on the Solr index (it's a pure Lucene program), could that be causing issues? This process never opens an IndexWriter. Thanks! On Tue, Jul 13, 2010 at 10:52 AM, Max Lynch ihas...@gmail.com wrote: Great, thanks! On Tue, Jul 13, 2010 at 2:55 AM, Fornoville, Tom tom.fornovi...@truvo.com wrote: If you're only adding documents you can also have a go with StreamingUpdateSolrServer instead of the CommonsHttpSolrServer. Couple that with the suggestion of master/slave so the searches don't interfere with the indexing and you should have a pretty responsive system. -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: maandag 12 juli 2010 22:30 To: solr-user@lucene.apache.org Subject: RE: CommonsHttpSolrServer add document hangs You could try a master slave setup using replication perhaps, then the slave serves searches and indexing commits on the master won't hang up searches at least... Here is the description: http://wiki.apache.org/solr/SolrReplication -Original Message- From: Max Lynch [mailto:ihas...@gmail.com] Sent: Monday, July 12, 2010 11:57 AM To: solr-user@lucene.apache.org Subject: Re: CommonsHttpSolrServer add document hangs Thanks Robert, My script did start going again, but it was waiting for about half an hour which seems a bit excessive to me. Is there some tuning I can do on the solr end to optimize for my use case, which is very heavy on commits and very light on searches (I do most of my searches on the raw Lucene index in the background)? Thanks. On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen rober...@buy.com wrote: Maybe solr is busy doing a commit or optimize? -Original Message- From: Max Lynch [mailto:ihas...@gmail.com] Sent: Monday, July 12, 2010 9:59 AM To: solr-user@lucene.apache.org Subject: CommonsHttpSolrServer add document hangs Hey guys, I'm using Solr 1.4.1 and I've been having some problems lately with code that adds documents through a CommonsHttpSolrServer. It seems that randomly the call to theserver.add() will hang. I am currently running my code in a single thread, but I noticed this would happen in multi threaded code as well. The jar version of commons-httpclient is 3.1. I got a thread dump of the process, and one thread seems to be waiting on the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as shown below. All other threads are in a RUNNABLE state (besides the Finalizer daemon). [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode): [java] [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10 tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000] [java]java.lang.Thread.State: WAITING (on object monitor) [java] at java.lang.Object.wait(Native Method) [java] - waiting on 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) [java] - locked 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) [java
Re: CommonsHttpSolrServer add document hangs
Great, thanks! On Tue, Jul 13, 2010 at 2:55 AM, Fornoville, Tom tom.fornovi...@truvo.comwrote: If you're only adding documents you can also have a go with StreamingUpdateSolrServer instead of the CommonsHttpSolrServer. Couple that with the suggestion of master/slave so the searches don't interfere with the indexing and you should have a pretty responsive system. -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: maandag 12 juli 2010 22:30 To: solr-user@lucene.apache.org Subject: RE: CommonsHttpSolrServer add document hangs You could try a master slave setup using replication perhaps, then the slave serves searches and indexing commits on the master won't hang up searches at least... Here is the description: http://wiki.apache.org/solr/SolrReplication -Original Message- From: Max Lynch [mailto:ihas...@gmail.com] Sent: Monday, July 12, 2010 11:57 AM To: solr-user@lucene.apache.org Subject: Re: CommonsHttpSolrServer add document hangs Thanks Robert, My script did start going again, but it was waiting for about half an hour which seems a bit excessive to me. Is there some tuning I can do on the solr end to optimize for my use case, which is very heavy on commits and very light on searches (I do most of my searches on the raw Lucene index in the background)? Thanks. On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen rober...@buy.com wrote: Maybe solr is busy doing a commit or optimize? -Original Message- From: Max Lynch [mailto:ihas...@gmail.com] Sent: Monday, July 12, 2010 9:59 AM To: solr-user@lucene.apache.org Subject: CommonsHttpSolrServer add document hangs Hey guys, I'm using Solr 1.4.1 and I've been having some problems lately with code that adds documents through a CommonsHttpSolrServer. It seems that randomly the call to theserver.add() will hang. I am currently running my code in a single thread, but I noticed this would happen in multi threaded code as well. The jar version of commons-httpclient is 3.1. I got a thread dump of the process, and one thread seems to be waiting on the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as shown below. All other threads are in a RUNNABLE state (besides the Finalizer daemon). [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode): [java] [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10 tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000] [java]java.lang.Thread.State: WAITING (on object monitor) [java] at java.lang.Object.wait(Native Method) [java] - waiting on 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) [java] - locked 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) [java] at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122) Any ideas? Thanks.
CommonsHttpSolrServer add document hangs
Hey guys, I'm using Solr 1.4.1 and I've been having some problems lately with code that adds documents through a CommonsHttpSolrServer. It seems that randomly the call to theserver.add() will hang. I am currently running my code in a single thread, but I noticed this would happen in multi threaded code as well. The jar version of commons-httpclient is 3.1. I got a thread dump of the process, and one thread seems to be waiting on the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as shown below. All other threads are in a RUNNABLE state (besides the Finalizer daemon). [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode): [java] [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10 tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000] [java]java.lang.Thread.State: WAITING (on object monitor) [java] at java.lang.Object.wait(Native Method) [java] - waiting on 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) [java] - locked 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) [java] at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122) Any ideas? Thanks.
Re: CommonsHttpSolrServer add document hangs
Thanks Robert, My script did start going again, but it was waiting for about half an hour which seems a bit excessive to me. Is there some tuning I can do on the solr end to optimize for my use case, which is very heavy on commits and very light on searches (I do most of my searches on the raw Lucene index in the background)? Thanks. On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen rober...@buy.com wrote: Maybe solr is busy doing a commit or optimize? -Original Message- From: Max Lynch [mailto:ihas...@gmail.com] Sent: Monday, July 12, 2010 9:59 AM To: solr-user@lucene.apache.org Subject: CommonsHttpSolrServer add document hangs Hey guys, I'm using Solr 1.4.1 and I've been having some problems lately with code that adds documents through a CommonsHttpSolrServer. It seems that randomly the call to theserver.add() will hang. I am currently running my code in a single thread, but I noticed this would happen in multi threaded code as well. The jar version of commons-httpclient is 3.1. I got a thread dump of the process, and one thread seems to be waiting on the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as shown below. All other threads are in a RUNNABLE state (besides the Finalizer daemon). [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode): [java] [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10 tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000] [java]java.lang.Thread.State: WAITING (on object monitor) [java] at java.lang.Object.wait(Native Method) [java] - waiting on 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) [java] - locked 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) [java] at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122) Any ideas? Thanks.
MailEntityProcessor class cast exception
With last night's build of solr, I am trying to use the MailEntityProcessor to index an email account. However, when I call my dataimport url, I receive a class cast exception: INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=44 Jun 16, 2010 8:16:03 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties WARNING: Unable to read: dataimport.properties Jun 16, 2010 8:16:03 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Jun 16, 2010 8:16:03 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/home/m/g/spider/misc/solrindex_nl/index,segFN=segments_1,version=1276738117525,generation=1,filenames=[segments_1] Jun 16, 2010 8:16:03 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1276738117525 Jun 16, 2010 8:16:03 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load EntityProcessor implementation for entity:99544078513223 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessor(DocBuilder.java:804) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:535) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:260) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:184) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:392) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:373) Caused by: java.lang.ClassCastException: org.apache.solr.handler.dataimport.MailEntityProcessor cannot be cast to org.apache.solr.handler.dataimport.EntityProcessor at org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessor(DocBuilder.java:801) ... 6 more Jun 16, 2010 8:16:03 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Jun 16, 2010 8:16:03 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback Here is my dataimport part of my solrconfig.xml: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config/home/max/packages/apache-solr-4.0-2010-06-16_08-05-33/e/solr/conf/data-config.xml/str /lst /requestHandler and my data-config.xml: dataConfig document entity processor=MailEntityProcessor user=*** password=*** host=*** protocol=imaps folders = INBOX/ /document /dataConfig I did try to rebuild the solr nightly, but I still receive the same error. I have all of the required jar's (AFAIK) in my application's lib folder. Any ideas? Thanks.
Unwanted clustering of search results after sorting by score
Hallo, We have a website on which you can search through a large amount of products from different shops. The information describing the products are provided to us by the shops which sell these products. If we sort a search result by score many products of the same shop are clustered together. The reason for this behavior is that a shops tend to use the same 'style' to describe their products. For example: Shop 'foo' describes its products with 250 words and uses the searched word once. Shop 'bar' describes its products with only 25 words and also uses the searched word once. The score for shop 'foo' will be much worst than for shop 'bar'. In a search in which are many products of shop 'foo' and 'bar' the products of shop 'bar' are shown before the products of shop 'foo'. We tried to avoid this behavior by not using the term frequency. But after this we got very strange products under the first results. Has anybody an idea to avoid the clustering of products (documents) which are from the same shop? Greetings Max
prefix-search ingnores the lowerCaseFilter
Hi, I want to perform a prefix-search which ignores cases. To do this I created a fielType called suggest: fieldType name=suggest class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Entrys (terms) could be 'foo', 'bar'... A request like http://localhost:8983/solr/select/?rows=0facet=trueq=*:*facet.field=suggestfacet.prefix=f returns things like lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=suggest int name=foo12/int /lst /lst /lst But a request like http://localhost:8983/solr/select/?rows=0facet=trueq=*:*facet.field=suggestfacet.prefix=F returns just: lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=suggest/ /lst /lst That's not what I've expected, cause the field-definition contains a LowerCaseFilter. Is it possible that the prefix-processing ignores the filters? Max