Problem with pdf files indexing
Hi !I'm using solr 3.3 version and i have some pdf files which i want to index. I followed instructions from the wiki page: http://wiki.apache.org/solr/ExtractingRequestHandler The problem is that i can add my documents to Solr but i cannot request them. Here is what i have: *solrconfig.xml*: requestHandler name=/update/extract startup=lazy class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=fmap.contenttext/str str name=lowernamestrue/str str name=uprefixignored_/str str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str /lst /requestHandler *schema.xml *: field name=title type=string indexed=true stored=true/ field name=author type=string indexed=true stored=true / field name=text type=text_general indexed=true stored=true multiValued=true/ *data-config.xml* : ... dataSource type=BinFileDataSource name=ds-file/ ... entity processor=TikaEntityProcessor dataSource=ds-file url=../${document.filename} field column=Author name=author meta=true/ field column=title name=title meta=true/ field column=text name=text/ /entity ... I use Solrj to add documents as follows: SolrServer server = new CommonsHttpSolrServer(http://localhost:8080/solr;); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(new File(d:\\test.pdf)); up.setParam(literal.id, test); up.setParam(extractOnly, true); server.commit(); NamedList result = server.request(up); System.out.println(Result: + result); // can display information about test.pdf QueryResponse rsp = server.query( new SolrQuery( *:*) ); System.out.println(rsp: + rsp); // returns nothing Any suggestion? -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-pdf-files-indexing-tp3527202p3527202.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: wild card search and lower-casing
I guess, I have found your comment, thanks. For our current needs I have just set: setLowercaseExpandedTerms(true); // changed from default false in the SolrQueryParser's constructor and that seem to work so far. In order not to start a separate thread on wildcards. Is it so, that for the trailing wildcard there is a minimum of 2 preceding characters for a search to happen? Dmitry On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson erickerick...@gmail.comwrote: It may be. The tricky bit is that there is a constant governing the behavior of this that restricts it to 3.6 and above. You'll have to change it after applying the patch for this to work for you. Should be trivial, I'll leave a note in the code about this, look for SOLR-2438 in the 3x code line for the place to change. On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan dmitry@gmail.com wrote: Thanks Erick. Do you think the patch you are working on will be applicable as well to 3.4? Best, Dmitry On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson erickerick...@gmail.com wrote: As it happens I'm working on SOLR-2438 which should address this. This patch will provide two things: The ability to define a new analysis chain in your schema.xml, currently called multiterm that will be applied to queries of various sorts, including wildcard, prefix, range. This will be somewhat of an expert thing to make yourself... In the absence of an explicit definition it'll synthesize a multiterm analyzer out of the query analyzer, taking any char fitlers, and lowercaseFilter (if present), and ASCIIFoldingfilter (if present) and putting them in the multiterm analyzer along with a (hardcoded) WhitespaceTokenizer. As of 3.6 and 4.0, this will be the default behavior, although you can explicitly define a field type parameter to specify the current behavior. The reason it is on 3.6 is that I want it to bake for a while before getting into the wild, so I have no intention of trying to get it into the 3.5 release. The patch is up for review now, I'd like another set of eyeballs or two on it before committing. The patch that's up there now is against trunk but I hope to have a 3x patch that I'll apply to the 3x code line after 3.5 RC1 is cut. Best Erick On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan iori...@yahoo.com wrote: You're right: public SolrQueryParser(IndexSchema schema, String defaultField) { ... setLowercaseExpandedTerms(false); ... } Please note that lowercaseExpandedTerms uses String.toLowercase() (uses default Locale) which is a Locale sensitive operation. In Lucene AnalyzingQueryParser exists for this purposes, but I am not sure if it is ported to solr. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
date range in solr 3.1
i try to use range faceting in solr 3.1 using facet.range=date, f.date.facet.range.gap=+1DAY, f.date.facet.range.start=NOW/DAY-5DAYS, and f.date.facet.range.end=NOW/DAY and i get this exception Exception during facet.range of date org.apache.solr.common.SolrException: Can't add gap 1DAYS to value Sun Nov 13 00:00:00 UTC 2011 for field: date at org.apache.solr.request.SimpleFacets$RangeEndpointCalculator.addGap(SimpleFacets.java:1093) at org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:873) at org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:839) at org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:778) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:178) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:563) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:399) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:317) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:204) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:182) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:311) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.text.ParseException: Unrecognized command: at org.apache.solr.util.DateMathParser.parseMath(DateMathParser.java:277) at org.apache.solr.request.SimpleFacets$DateRangeEndpointCalculator.parseAndAddGap(SimpleFacets.java:1188) at org.apache.solr.request.SimpleFacets$DateRangeEndpointCalculator.parseAndAddGap(SimpleFacets.java:1160) at org.apache.solr.request.SimpleFacets$RangeEndpointCalculator.addGap(SimpleFacets.java:1091) ... 27 more can you help me plz thanks in advance :) -- View this message in context: http://lucene.472066.n3.nabble.com/date-range-in-solr-3-1-tp3527498p3527498.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Integrating Surround Query Parser
--- On Tue, 11/22/11, Rahul Mehta rahul23134...@gmail.com wrote: From: Rahul Mehta rahul23134...@gmail.com Subject: Integrating Surround Query Parser To: solr-user@lucene.apache.org Date: Tuesday, November 22, 2011, 8:05 AM Hello, I want to Run surround query . 1. Downloading from http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm 2. Moved the lucene-surround-2.4.1.jar to /apache-solr-3.1.0/example/lib 3. Edit the solrconfig.xml with 1. queryParser name=SurroundQParser class= org.apache.lucene.queryParser.surround.parser.QueryParser/ 4. Restart Solr Got this error : org.apache.solr.common.SolrException: Error Instantiating QParserPlugin, org.apache.lucene.queryParser.surround.parser.QueryParser is not a org.apache.solr.search.QParserPlugin at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:425) -- Thanks Regards Hello Rahul, It is already integrated. Please see : http://wiki.apache.org/solr/SurroundQueryParser
Re: how to use term proxymity queries with apache solr
Have used Proximity Queries only work using a sloppy phrase query (e.g.: catalyst polymer ~5) but do not allow wildcards. Want to use Proximity Queries between any terms (e.g.: (poly* NEAR *lyst)) is this possible using additional query parsers like Surround? if yes ,Please suggest how to install surround. currently we are using solr 3.1 . Not sure about leading wildcard but uou can use https://issues.apache.org for this.
How to be sure that surround
I have done the following steps for installing surround plugin. 1. Downloading from http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm 2. Moved the lucene-surround-2.4.1.jar to /apache-solr-3.1.0/example/lib 3. restart solr . But How to be sure that surround plugin is being installed . Means what query i can run. -- Thanks Regards Rahul Mehta
Re: how to make effective search with fq and q params
Usually, Use the 'q' parameter to search for the free text values entered by the users (where you might want to parse the query and/or apply boosting/phrase-sloppy, minimum match,tie etc ) Use the 'fq' to limit the searches to certain criterias like location, date-ranges etc. Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to use term proxymity queries with apache solr
Not sure about leading wildcard but you can use https://issues.apache.org for this. Sorry, link was : https://issues.apache.org/jira/browse/SOLR-1604
Re: How to be sure that surround
I have done the following steps for installing surround plugin. 1. Downloading from http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm 2. Moved the lucene-surround-2.4.1.jar to /apache-solr-3.1.0/example/lib 3. restart solr . But How to be sure that surround plugin is being installed . Means what query i can run. Rahul, you need to switch to solr-trunk, it is already there http://wiki.apache.org/solr/SurroundQueryParser
Re: How to be sure that surround
I have the solr-trunk , but queries are running on both (on trunk (4.0) and on (3.1) ) . then how i can be sure that what query will run by surround query parser plugin. The query i tried : http://localhost:8983/solr/select?q=abstracts:99n(flat,panel,display) http://localhost:8983/solr/select?q=abstracts:(poly*%20NEAR%20*lyst) The above queries both are running on 3.1 and 4.0 How i can sure that these query are running by Surround Plugin. On Tue, Nov 22, 2011 at 5:51 PM, Ahmet Arslan iori...@yahoo.com wrote: I have done the following steps for installing surround plugin. 1. Downloading from http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm 2. Moved the lucene-surround-2.4.1.jar to /apache-solr-3.1.0/example/lib 3. restart solr . But How to be sure that surround plugin is being installed . Means what query i can run. Rahul, you need to switch to solr-trunk, it is already there http://wiki.apache.org/solr/SurroundQueryParser -- Thanks Regards Rahul Mehta
Re: how to use term proxymity queries with apache solr
do i need to install this seperately or it is integrated in solr 4.0 ? On Tue, Nov 22, 2011 at 5:49 PM, Ahmet Arslan iori...@yahoo.com wrote: Not sure about leading wildcard but you can use https://issues.apache.org for this. Sorry, link was : https://issues.apache.org/jira/browse/SOLR-1604 -- Thanks Regards Rahul Mehta
Solr highlighting isn't work!
Hello!!! I have a trouble with Solr highlighting. I have any document with next fields- TYPE, DBID and others. When i do next request - https://localhost:8443/solr/myCore/afts?wt=standardq=TYPE: https://localhost:8443/solr/myCore/afts?wt=standardq=TYPE:cm:contentindent=onhl=truehl.fl=DBIDhl.usePhraseHighlighter=truefl=DBID it was returned next text: response lst name=responseHeader int name=status0/int int name=QTime3/int /lst result name=response numFound=166 start=0 doc arr name=DBID str892/str /arr /doc doc /result lst name=highlighting lst name=LEAF-892/ /lst /response What is problem? Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-highlighting-isn-t-work-tp3527701p3527701.html Sent from the Solr - User mailing list archive at Nabble.com.
Stats per group with StatsComponent?
Hi We need to get minimum and maximum values for a field, within a group in a grouped search-result. Is this possible today, perhaps by using StatsComponent some way? I'll flesh out the example a little, to make the question clearer. We have a number of documents, indexed with a price, date and a hotel. For each hotel, there are a number of documents, each representing a price/date combination. We then group our search result on hotel. We want to show the minimum and maximum price for each hotel. A little googling leads us to look at StatsComponent, as what it does would be what we need, if it could be done for each group. There was a thread on this list in August, Grouping and performing statistics per group that seemed to go into this a bit, but didn't find a solution. Is this possible in Solr 3.4, either with StatsComponent, or some other way? -- Morten We all live in a yellow subroutine.
Re: how to make effective search with fq and q params
Thanks Pravesh for your reply.. I definitely try this.. i hope it will improve solr response time. pravesh wrote Usually, Use the 'q' parameter to search for the free text values entered by the users (where you might want to parse the query and/or apply boosting/phrase-sloppy, minimum match,tie etc ) Use the 'fq' to limit the searches to certain criterias like location, date-ranges etc. Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527654.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to be sure that surround
I have the solr-trunk , but queries are running on both (on trunk (4.0) and on (3.1) ) . then how i can be sure that what query will run by surround query parser plugin. The query i tried : http://localhost:8983/solr/select?q=abstracts:99n(flat,panel,display) http://localhost:8983/solr/select?q=abstracts:(poly*%20NEAR%20*lyst) The above queries both are running on 3.1 and 4.0 How i can sure that these query are running by Surround Plugin. You can use q={!surround df=abstracts}99n(flat,panel,display) If you append debugQuery=on, it should display some info regarding which query parser is used, which Query is constructed etc.
Re: how to use term proxymity queries with apache solr
do i need to install this seperately or it is integrated in solr 4.0 ? You need to install SOLR-1604 separately. But this is easy since it is implemented as a solr plugin.
Re: Integrating Surround Query Parser
The surround query parser is fully wired into Solr trunk/4.0, if that helps. See http://wiki.apache.org/solr/SurroundQueryParser and the JIRA issue linked there in case you want to patch it into a different version. Erik On Jan 21, 2011, at 02:24 , Ahson Iqbal wrote: Hi All I want to integrate Surround Query Parser with solr, To do this i have downloaded jar file from the internet and and then pasting that jar file in web-inf/lib and configured query parser in solrconfig.xml as queryParser name=SurroundQParser class=org.apache.lucene.queryParser.surround.parser.QueryParser/ now when i load solr admin page following exception comes org.apache.solr.common.SolrException: Error Instantiating QParserPlugin, org.apache.lucene.queryParser.surround.parser.QueryParser is not a org.apache.solr.search.QParserPlugin what i think that i didnt get the right plugin, can any body guide me from where to get right plugin for surround query parser or how to accurately integrate this plugin with solr. thanx Ahsan
Re: how to make effective search with fq and q params
If all you're doing is filtering (browsing by facets perhaps), it's perfectly fine to have q=*:*. MatchAllDocsQuery is fast (and would be cached anyway), so use *:* as appropriate without worries. Erik On Nov 22, 2011, at 07:18 , pravesh wrote: Usually, Use the 'q' parameter to search for the free text values entered by the users (where you might want to parse the query and/or apply boosting/phrase-sloppy, minimum match,tie etc ) Use the 'fq' to limit the searches to certain criterias like location, date-ranges etc. Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [ANNOUNCEMENT] Second Edition of the First Book on Solr
Congratulations! Feel free to write a shorter version of the announcement text, suitable as a news teaser on the Solr site, and we'll try to update the site with new thumb and all. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 18. nov. 2011, at 06:17, Smiley, David W. wrote: Fellow Solr users, I am proud to announce that the book Apache Solr 3 Enterprise Search Server is officially published! This is the second edition of the first book on Solr by me, David Smiley, and my co-author Eric Pugh. You can find full details about the book, download a free chapter, and purchase it here: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book It is also available through other channels like Amazon. You can feel good about the purchase knowing that 5% of each sale goes to support the Apache Software Foundation. If you buy directly from the publisher, then the basis of the percentage that goes to the ASF (and to me) is higher than if you buy it through other channels. This book naturally covers the latest features in Solr as of version 3.4 like Result Grouping and Geospatial, but this is not a small update to the first book. We have more experience with Solr and we've listened to reader feedback from the first edition. No chapter was untouched: Faceting gets its own chapter, all search relevancy matters are discussed in one chapter, auto-complete approaches are all discussed together, much of the chapter on integration was rewritten to discuss newer technologies, and the first chapter was greatly streamlined. Furthermore, each chapter has a tip in the introduction that advises readers in a hurry on what parts should be read now or later. Finally, we developed a 2-page parameter quick-reference appendix that you will surely find useful printed on your desk. In summary, we improved the existing content, and added about 25% more by page count. Software, errata, and other information about this book and the previous edition is on our website: http://www.solrenterprisesearchserver.com/ We've been working hard on this book for the last 10 months and we hope it really helps saves you time and improves your search project! Apache Solr 3 Enterprise Search Server In Detail: If you are a developer building an app today then you know how important a good search experience is. Apache Solr, built on Apache Lucene, is a wildly popular open source enterprise search server that easily delivers powerful search and faceted navigation features that are elusive with databases. Solr supports complex search criteria, faceting, result highlighting, query-completion, query spell-check, relevancy tuning, and more. Apache Solr 3 Enterprise Search Server is a comprehensive reference guide for every feature Solr has to offer. It serves the reader right from initiation to development to deployment. It also comes with complete running examples to demonstrate its use and show how to integrate Solr with other languages and frameworks. Through using a large set of metadata about artists, releases, and tracks courtesy of the MusicBrainz.org project, you will have a testing ground for Solr, and will learn how to import this data in various ways. You will then learn how to search this data in different ways, including Solr's rich query syntax and boosting match scores based on record data. Finally, we'll cover various deployment considerations to include indexing strategies and performance-oriented configuration that will enable you to scale Solr to meet the needs of a high-volume site. Sincerely, David Smiley (primary author) david.w.smi...@gmail.com Eric Pugh (co-author) ep...@opensourceconnections.com
Re: date range in solr 3.1
Hi, Long shot: Try f.date.facet.range.gap=%2B1DAY instead, in case your + was interpreted as space by your browser... -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 22. nov. 2011, at 12:57, do3do3 wrote: i try to use range faceting in solr 3.1 using facet.range=date, f.date.facet.range.gap=+1DAY, f.date.facet.range.start=NOW/DAY-5DAYS, and f.date.facet.range.end=NOW/DAY and i get this exception Exception during facet.range of date org.apache.solr.common.SolrException: Can't add gap 1DAYS to value Sun Nov 13 00:00:00 UTC 2011 for field: date at org.apache.solr.request.SimpleFacets$RangeEndpointCalculator.addGap(SimpleFacets.java:1093) at org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:873) at org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:839) at org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:778) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:178) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:563) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:399) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:317) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:204) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:182) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:311) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.text.ParseException: Unrecognized command: at org.apache.solr.util.DateMathParser.parseMath(DateMathParser.java:277) at org.apache.solr.request.SimpleFacets$DateRangeEndpointCalculator.parseAndAddGap(SimpleFacets.java:1188) at org.apache.solr.request.SimpleFacets$DateRangeEndpointCalculator.parseAndAddGap(SimpleFacets.java:1160) at org.apache.solr.request.SimpleFacets$RangeEndpointCalculator.addGap(SimpleFacets.java:1091) ... 27 more can you help me plz thanks in advance :) -- View this message in context: http://lucene.472066.n3.nabble.com/date-range-in-solr-3-1-tp3527498p3527498.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Matching + and
Why do you need spaces in the replacement? Try pattern=\+ replacement=plus - it will cause the transformed charstream to contain as many tokens as the original and avoid the highlighting crash. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 22. nov. 2011, at 05:40, Tomasz Wegrzanowski wrote: Hi, I've been trying to match some phrases with + and (like c++, google+, rd etc.), but tokenized gets rid of them before I can do anything with synonym filters. So I tried using CharFilters like this: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=\+ replacement= plus / charFilter class=solr.PatternReplaceCharFilterFactory pattern=amp; replacement= and / tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms_case_sensitive.txt ignoreCase=false expand=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=English/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=query_synonyms.txt ignoreCase=true expand=false / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=English/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType This mostly works, but for a very small number of documents, mostly those with large number of pluses in them, highlighter just crashes (and it's highlighter since turning it off and reissuing the query works just fine, if I replace pluses with spaces and reindex, the same query reruns just fine) with exception like this: Nov 21, 2011 11:35:11 PM org.apache.solr.common.SolrException log SEVERE: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1938) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:237) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:462) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:378) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:343) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at
Unexpected cpu load and Solr incrase response time
Hi, we currently have 2 servers running on JBoss container (master and slave) with 20mln documents and about 3GB index size. Java was executed with options: *-Xms12G -Xmx12G -XX:NewSize=4G -XX:MaxNewSize=4G -XX:MaxPermSize=256m -Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.client.gcInterval=360 -Dsun.rmi.dgc.server.gcInterval=360 -XX:+UseCompressedOops -XX:+UseTLAB -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled* Commit duration on master is 5 minutes and we use Solr 3.3 (because in 3.4 we have a problem with dataimport https://issues.apache.org/jira/browse/SOLR-2804). We have problem that occurs when server gets about *34* qps. *Do you have any advice how to fix this problem?* I have attached the charts below. The load and threads count increases between 19:00 and 20:00. On 20:10 we reduced by half the number of queries. http://lucene.472066.n3.nabble.com/file/n3527914/solr_users_reqs-day.png http://lucene.472066.n3.nabble.com/file/n3527914/load-day.png http://lucene.472066.n3.nabble.com/file/n3527914/threads-day.png http://lucene.472066.n3.nabble.com/file/n3527914/jboss_threads-day.png http://lucene.472066.n3.nabble.com/file/n3527914/cpu-day.png http://lucene.472066.n3.nabble.com/file/n3527914/_avg_response_query_time-22-11-2011_15_50_17.png -- View this message in context: http://lucene.472066.n3.nabble.com/Unexpected-cpu-load-and-Solr-incrase-response-time-tp3527914p3527914.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to make effective search with fq and q params
Hi Erik: When using [e]dismax, does configuring q.alt=*:* and not specifying q affect the performance/caching in any way? As a side note, a while back I configured q.alt=*:*, and the application (via SolrJ) still set q=*:* if no user input was provided (faceting). With both of them set that way, I got zero results. (Solr 3.4.0) Interesting. Thanks, Jeff On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote: If all you're doing is filtering (browsing by facets perhaps), it's perfectly fine to have q=*:*. MatchAllDocsQuery is fast (and would be cached anyway), so use *:* as appropriate without worries. Erik On Nov 22, 2011, at 07:18 , pravesh wrote: Usually, Use the 'q' parameter to search for the free text values entered by the users (where you might want to parse the query and/or apply boosting/phrase-sloppy, minimum match,tie etc ) Use the 'fq' to limit the searches to certain criterias like location, date-ranges etc. Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html Sent from the Solr - User mailing list archive at Nabble.com. -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068
Re: how to make effective search with fq and q params
On Nov 22, 2011, at 09:55 , Jeff Schmidt wrote: When using [e]dismax, does configuring q.alt=*:* and not specifying q affect the performance/caching in any way? No different than using q=*:* with the lucene query parser. MatchAllDocsQuery is possibly the fastest query out there! (it simply matches documents in index order, all scores are 1.0) As a side note, a while back I configured q.alt=*:*, and the application (via SolrJ) still set q=*:* if no user input was provided (faceting). With both of them set that way, I got zero results. (Solr 3.4.0) Interesting. Ouch. Really? I don't see in the code (looking at my trunk checkout) where there's any *:* used in the SolrJ library. Can you provide some details on how you used SolrJ? It'd be good to track this down as that seems like a bug to me. Erik Thanks, Jeff On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote: If all you're doing is filtering (browsing by facets perhaps), it's perfectly fine to have q=*:*. MatchAllDocsQuery is fast (and would be cached anyway), so use *:* as appropriate without worries. Erik On Nov 22, 2011, at 07:18 , pravesh wrote: Usually, Use the 'q' parameter to search for the free text values entered by the users (where you might want to parse the query and/or apply boosting/phrase-sloppy, minimum match,tie etc ) Use the 'fq' to limit the searches to certain criterias like location, date-ranges etc. Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html Sent from the Solr - User mailing list archive at Nabble.com. -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068
How to select all docs of 'today' ?
Hi, I have a fetch-time (date) field to know when the documents were fetched. I want to make a query to get all documents fetched today. I tried : fetch-time:NOW/DAY but it returns always 0. fetch-time:[NOW/DAY TO NOW/DAY] (it returns 0) fetch-time:[NOW/DAY-1DAY TO NOW/DAY] but it returns documents fetched yesterday. fetch-time:[NOW/DAY-1HOUR TO NOW/DAY] but it's incorrect too. Do you have any idea ? Thanks in advance.
Re: Solr real time update
Yu: To get Near Real Time update in Solr 1.4.1 you will need to use Solr 1.4.1 with RankingAlgorithm. This allows you to update documents in near real time. You can download and give this a try from here: http://solr-ra.tgels.org/ Regards, - Nagendra Nagarajayya http://solr-ra.tgels.org/ http://rankingalgorithm.tgels.org/ On 11/21/2011 9:47 PM, yu shen wrote: Hi All, After some study, I used below snippet. Seems the documents is updated, while still takes a long time. Feels like the parameter does not take effect. Any comments? UpdateRequest req = new UpdateRequest(); req.add(solrDocs); req.setCommitWithin(5000); req.setParam(commitWithin, 5000); req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); req.process(SOLR_SERVER); 2011/11/22 yu shenshenyu...@gmail.com Hi All, I try to do a 'nearly real time update' to solr. My solr version is 1.4.1. I read this solr CommentWithinhttp://wiki.apache.org/solr/CommitWithinwiki, and a related threadhttp://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-td3472709.htmlmostly on the difficulty to do this. My issue is I tried the code snippet in the wiki: UpdateRequest req = new UpdateRequest(); req.add(mySolrInputDocument); req.setCommitWithin(1); req.process(server); But my index did not get updated, unless I call SOLR_SERVER.commit(); explicitly. The latter call will take more than 1 minute on average to return. Can I do a real time update on solr 1.4.1? Would someone help to show a workable code snippet? Spark
Re: wild card search and lower-casing
No, no, no That's something buried in Lucene, it has nothing to do with the patch! The patch has NOT yet been applied to any released code. You could pull the patch from the JIRA and apply it to trunk locally if you wanted. But there's no patch for 3.x, I'll probably put that up over the holiday. But things have changed a bit (one of the things I'll have to do is create some documentation). You *should* be able to specify just legacyMultiTerm=true in your fieldType if you want to apply the 3.x patch to pre 3.6 code. It would be a good field test if that worked for you. But you can't do any of this until the JIRA (SOLR-2438) is marked Resolution: Fixed. Don't be fooled by Fix Version. Fix Version simply says that those are the earliest versions it *could* go in. Best Erick Best Erick On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan dmitry@gmail.com wrote: I guess, I have found your comment, thanks. For our current needs I have just set: setLowercaseExpandedTerms(true); // changed from default false in the SolrQueryParser's constructor and that seem to work so far. In order not to start a separate thread on wildcards. Is it so, that for the trailing wildcard there is a minimum of 2 preceding characters for a search to happen? Dmitry On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson erickerick...@gmail.comwrote: It may be. The tricky bit is that there is a constant governing the behavior of this that restricts it to 3.6 and above. You'll have to change it after applying the patch for this to work for you. Should be trivial, I'll leave a note in the code about this, look for SOLR-2438 in the 3x code line for the place to change. On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan dmitry@gmail.com wrote: Thanks Erick. Do you think the patch you are working on will be applicable as well to 3.4? Best, Dmitry On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson erickerick...@gmail.com wrote: As it happens I'm working on SOLR-2438 which should address this. This patch will provide two things: The ability to define a new analysis chain in your schema.xml, currently called multiterm that will be applied to queries of various sorts, including wildcard, prefix, range. This will be somewhat of an expert thing to make yourself... In the absence of an explicit definition it'll synthesize a multiterm analyzer out of the query analyzer, taking any char fitlers, and lowercaseFilter (if present), and ASCIIFoldingfilter (if present) and putting them in the multiterm analyzer along with a (hardcoded) WhitespaceTokenizer. As of 3.6 and 4.0, this will be the default behavior, although you can explicitly define a field type parameter to specify the current behavior. The reason it is on 3.6 is that I want it to bake for a while before getting into the wild, so I have no intention of trying to get it into the 3.5 release. The patch is up for review now, I'd like another set of eyeballs or two on it before committing. The patch that's up there now is against trunk but I hope to have a 3x patch that I'll apply to the 3x code line after 3.5 RC1 is cut. Best Erick On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan iori...@yahoo.com wrote: You're right: public SolrQueryParser(IndexSchema schema, String defaultField) { ... setLowercaseExpandedTerms(false); ... } Please note that lowercaseExpandedTerms uses String.toLowercase() (uses default Locale) which is a Locale sensitive operation. In Lucene AnalyzingQueryParser exists for this purposes, but I am not sure if it is ported to solr. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
Re: wild card search and lower-casing
Thanks, Erick. I was in fact reading the patch (the one attached as a file to the aforementioned jira) you updated sometime yesterday. I'll watch the issue, but as said the change of a hard-coded boolean to its opposite worked just fine for me. Best, Dmitry On 11/22/11, Erick Erickson erickerick...@gmail.com wrote: No, no, no That's something buried in Lucene, it has nothing to do with the patch! The patch has NOT yet been applied to any released code. You could pull the patch from the JIRA and apply it to trunk locally if you wanted. But there's no patch for 3.x, I'll probably put that up over the holiday. But things have changed a bit (one of the things I'll have to do is create some documentation). You *should* be able to specify just legacyMultiTerm=true in your fieldType if you want to apply the 3.x patch to pre 3.6 code. It would be a good field test if that worked for you. But you can't do any of this until the JIRA (SOLR-2438) is marked Resolution: Fixed. Don't be fooled by Fix Version. Fix Version simply says that those are the earliest versions it *could* go in. Best Erick Best Erick On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan dmitry@gmail.com wrote: I guess, I have found your comment, thanks. For our current needs I have just set: setLowercaseExpandedTerms(true); // changed from default false in the SolrQueryParser's constructor and that seem to work so far. In order not to start a separate thread on wildcards. Is it so, that for the trailing wildcard there is a minimum of 2 preceding characters for a search to happen? Dmitry On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson erickerick...@gmail.comwrote: It may be. The tricky bit is that there is a constant governing the behavior of this that restricts it to 3.6 and above. You'll have to change it after applying the patch for this to work for you. Should be trivial, I'll leave a note in the code about this, look for SOLR-2438 in the 3x code line for the place to change. On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan dmitry@gmail.com wrote: Thanks Erick. Do you think the patch you are working on will be applicable as well to 3.4? Best, Dmitry On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson erickerick...@gmail.com wrote: As it happens I'm working on SOLR-2438 which should address this. This patch will provide two things: The ability to define a new analysis chain in your schema.xml, currently called multiterm that will be applied to queries of various sorts, including wildcard, prefix, range. This will be somewhat of an expert thing to make yourself... In the absence of an explicit definition it'll synthesize a multiterm analyzer out of the query analyzer, taking any char fitlers, and lowercaseFilter (if present), and ASCIIFoldingfilter (if present) and putting them in the multiterm analyzer along with a (hardcoded) WhitespaceTokenizer. As of 3.6 and 4.0, this will be the default behavior, although you can explicitly define a field type parameter to specify the current behavior. The reason it is on 3.6 is that I want it to bake for a while before getting into the wild, so I have no intention of trying to get it into the 3.5 release. The patch is up for review now, I'd like another set of eyeballs or two on it before committing. The patch that's up there now is against trunk but I hope to have a 3x patch that I'll apply to the 3x code line after 3.5 RC1 is cut. Best Erick On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan iori...@yahoo.com wrote: You're right: public SolrQueryParser(IndexSchema schema, String defaultField) { ... setLowercaseExpandedTerms(false); ... } Please note that lowercaseExpandedTerms uses String.toLowercase() (uses default Locale) which is a Locale sensitive operation. In Lucene AnalyzingQueryParser exists for this purposes, but I am not sure if it is ported to solr. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html -- Regards, Dmitry Kan
AW: How to select all docs of 'today' ?
Hi, fetch-time:[NOW/DAY TO NOW] should do it. Best Sebastian -Ursprüngliche Nachricht- Von: Danicela nutch [mailto:danicela-nu...@mail.com] Gesendet: Dienstag, 22. November 2011 16:08 An: solr-user@lucene.apache.org Betreff: How to select all docs of 'today' ? Hi, I have a fetch-time (date) field to know when the documents were fetched. I want to make a query to get all documents fetched today. I tried : fetch-time:NOW/DAY but it returns always 0. fetch-time:[NOW/DAY TO NOW/DAY] (it returns 0) fetch-time:[NOW/DAY-1DAY TO NOW/DAY] but it returns documents fetched yesterday. fetch-time:[NOW/DAY-1HOUR TO NOW/DAY] but it's incorrect too. Do you have any idea ? Thanks in advance.
Re: Autocomplete(terms) performance problem
You should try out the autocomplete component using Solr with RankingAlgorithm. The performance is less than 3ms for a 1 million Wikipedia titles index with very low deviation. You can get more information about the performance with different indexes of size 3k, 390k, 1m, 10m docs from here: http://solr-ra.tgels.org/solr-ra-autocomplete.jsp - Nagendra Nagarajayya -- View this message in context: http://lucene.472066.n3.nabble.com/Autocomplete-terms-performance-problem-tp3351352p3528112.html Sent from the Solr - User mailing list archive at Nabble.com.
weird issue with solr and CentOS 5.7
Hi all, I'm facing a real weird issue here with solr (lucene 3.3) and CentOS 5.7. I've two servers, one running CentOS 5.5 and the other running CentOS 5.7. Both servers has the same solr, java and tomcat versions, the only difference between them is OS version. I added a custom field to schema.xml: field name=stream_isPrivate type=boolean indexed=true stored=true required=false/. When that type is boolean, on CentOS 5.5 works OK indexing Chinese characters, but on CentOS 5.7 I got this exception: Nov 22, 2011 11:27:11 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select/ params={indent=onstart=0q=我们从右上角讲起rows=10version=2.2} hits=1 status=0 QTime=8 Nov 22, 2011 11:27:11 PM org.apache.solr.common.SolrException log SEVERE: java.lang.StringIndexOutOfBoundsException: String index out of range: 0 at java.lang.String.charAt(String.java:694) at org.apache.solr.schema.BoolField.write(BoolField.java:129) at org.apache.solr.schema.SchemaField.write(SchemaField.java:124) at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369) at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545) at org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482) at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519) at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582) at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685) at java.lang.Thread.run(Thread.java:636) That only happens on CentOS 5.7. I also tested on Ubuntu Server, and also works OK. solrconfig.xml and everything else is the same on both servers. Any idea what could be happening? Should it be a CentOS bug? Regards. -- Boris Quiroz boris.qui...@menco.it
NullPointerException with distributed facets
Hi, When doing a distributed query in solr 4.0 (4.0.0.2011.06.25.15.36.22) with facet.missing=true and facet.limit=20 I get a NullPointerException. By increasing the facet limit to 200 or setting facet missing to false it seems to fix it. The shards both contain the field but one shard always has a value and one never has a value. Single shard queries work fine on each shard. Does anyone know the cause or a fix? java.lang.NullPointerException at org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:489) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:278) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:292) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1452) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:926) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Phil
Re: how to make effective search with fq and q params
Hi Erik: It's not in the SolrJ library, but rather my use of it: In my application code: protected static final String SOLR_ALL_DOCS_QUERY = *:*; /* * If no search terms provided, then return all neighbors. * Results are to be returned in neighbor symbol alphabetical order. */ if (searchTerms == null) { searchTerms = SOLR_ALL_DOCS_QUERY; nodeQuery.addSortField(n_name, SolrQuery.ORDER.asc); } So, if no user search terms are provided, I search all documents (there are other fqs in effect) and return them in name order. That worked just fine. Then I read more about [e]dismax, and went and configured: str name=q.alt*:*/str Then I would get zero results. It's not a SolrJ issue though, as this request in my browser also resulted in zero results: http://localhost:8091/solr/ing-content/select/?qt=partner-tmofq=type%3Anodefq=n_neighborof_id%3AING\:afaq=*:*rows=5facet=truefacet.mincount=1facet.field=n_neighborof_processExactfacet.field=n_neighborof_edge_type That was due to the q=*:*. Once I set, say, q=cancer, I got results. So I guess this is a [e]dismax thing? (partner-tmo is the name of my request handler). I solved my problem by net setting *:* in my application, and left q.alt=*:* in place. Hope this helps. Again, this is stock Solr 3.4.0, running the Apache war under Tomcat 6. Jeff On Nov 22, 2011, at 8:05 AM, Erik Hatcher wrote: On Nov 22, 2011, at 09:55 , Jeff Schmidt wrote: When using [e]dismax, does configuring q.alt=*:* and not specifying q affect the performance/caching in any way? No different than using q=*:* with the lucene query parser. MatchAllDocsQuery is possibly the fastest query out there! (it simply matches documents in index order, all scores are 1.0) As a side note, a while back I configured q.alt=*:*, and the application (via SolrJ) still set q=*:* if no user input was provided (faceting). With both of them set that way, I got zero results. (Solr 3.4.0) Interesting. Ouch. Really? I don't see in the code (looking at my trunk checkout) where there's any *:* used in the SolrJ library. Can you provide some details on how you used SolrJ? It'd be good to track this down as that seems like a bug to me. Erik Thanks, Jeff On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote: If all you're doing is filtering (browsing by facets perhaps), it's perfectly fine to have q=*:*. MatchAllDocsQuery is fast (and would be cached anyway), so use *:* as appropriate without worries. Erik On Nov 22, 2011, at 07:18 , pravesh wrote: Usually, Use the 'q' parameter to search for the free text values entered by the users (where you might want to parse the query and/or apply boosting/phrase-sloppy, minimum match,tie etc ) Use the 'fq' to limit the searches to certain criterias like location, date-ranges etc. Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html Sent from the Solr - User mailing list archive at Nabble.com. -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068 -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068
Re : AW: How to select all docs of 'today' ?
Thanks it works. All this is based on the fact that NOW/DAY means the beginning of the day. - Message d'origine - De : sebastian.pet...@tib.uni-hannover.de Envoyés : 22.11.11 16:46 À : solr-user@lucene.apache.org Objet : AW: How to select all docs of 'today' ? Hi, fetch-time:[NOW/DAY TO NOW] should do it. Best Sebastian -Ursprüngliche Nachricht- Von: Danicela nutch [mailto:danicela-nu...@mail.com] Gesendet: Dienstag, 22. November 2011 16:08 An: solr-user@lucene.apache.org Betreff: How to select all docs of 'today' ? Hi, I have a fetch-time (date) field to know when the documents were fetched. I want to make a query to get all documents fetched today. I tried : fetch-time:NOW/DAY but it returns always 0. fetch-time:[NOW/DAY TO NOW/DAY] (it returns 0) fetch-time:[NOW/DAY-1DAY TO NOW/DAY] but it returns documents fetched yesterday. fetch-time:[NOW/DAY-1HOUR TO NOW/DAY] but it's incorrect too. Do you have any idea ? Thanks in advance.
Re: FunctionQuery score=0
Can this be fixed somehow? I also need the real score. On Sun, Nov 20, 2011 at 10:44 AM, John fatmanc...@gmail.com wrote: After playing some more with this I managed to get what I want, almost. My query now looks like: q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02 title^0.08 categorysearch^0.05 boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '}) With the above query, I am getting only the results that I want, the ones whose score after my FucntionQuery are above 0, but the problem now is that the final score for all results is changed to 1, which affects the sorting. How can I keep the original score that is calculated by the edismax query? Cheers, John On Fri, Nov 18, 2011 at 10:50 AM, Andre Bois-Crettez andre.b...@kelkoo.com wrote: Definitely worked for me, with a classic full text search on ipod and such. Changing the lower bound changed the number of results. Follow Chris advice, and give more details. John wrote: Doesn't seem to work. I though that FilterQueries work before the search is performed and not after... no? Debug doesn't include filter query only the below (changed a bit): BoostedQuery(boost(+fieldName:**,boostedFunction(ord(** fieldName),query))) On Thu, Nov 17, 2011 at 5:04 PM, Andre Bois-Crettez andre.b...@kelkoo.comwrote: John wrote: Some of the results are receiving score=0 in my function and I would like them not to appear in the search results. you can use frange, and filter by score: q=ipodfq={!frange l=0 incl=false}query($q) -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/ -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/
Faceting is not Using Field Value Cache . . ?
Seeing something odd going on with faceting . . . we execute facets with every query and yet the fieldValueCache is not being used: name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 0 hits : 0 hitratio : 0.00 inserts : 0 evictions : 0 size : 0 warmupTime : 0 cumulative_lookups : 0 cumulative_hits : 0 cumulative_hitratio : 0.00 cumulative_inserts : 0 cumulative_evictions : 0 I was under the impression the fieldValueCache was an implicit cache (if you don't define it, it will still exist). We are running Solr v3.3 (and NOT using {!cache=false}). Thoughts?
Re: Problems with AutoSuggest feature(Terms Components)
Hi Erick, Thanks for your reply. I would know all the options that can be given under the defaults section and how they can be overridden. is there any documentation available in solr forum. Cos we tried searching and wasn't able to succeed. My Exact scenario is that, I have one master core which has many underlying shards core(Disturbed architecture). I want the terms.limit should be defaulted to 10 in the underlying shards cores. When i hit the master core, it will in-turn hit the underlying shard cores. At this point of time, the terms.limit which has been passed to the master core has to passed to these underlying shard cores overriding the default value set. Can you please suggest the definition of the terms component for the underlying shard cores. Regards, Sivaganesh -- View this message in context: http://lucene.472066.n3.nabble.com/Problems-with-AutoSuggest-feature-Terms-Components-tp3512734p3528597.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: To push the terms.limit parameter from the master core to all the shard cores.
Hi Mark, Thanks for your suggestion. My Exact scenario is that, I have one master core which has many underlying shards core(Disturbed architecture). I want the terms.limit should be defaulted to 10 in the underlying shards cores. When i hit the master core, it will in-turn hit the underlying shard cores. At this point of time, the terms.limit which has been passed to the master core has to passed dynamically to these underlying shard cores overriding the default value set. Can you please suggest the definition of the terms component for the underlying shard cores. I would know all the options that can be given under the defaults section and how they can be overridden. is there any documentation available in solr forum. Cos we tried searching and wasn't able to succeed. Regards, Sivaganesh -- View this message in context: http://lucene.472066.n3.nabble.com/To-push-the-terms-limit-parameter-from-the-master-core-to-all-the-shard-cores-tp3520609p3528608.html Sent from the Solr - User mailing list archive at Nabble.com.
Highlight with multi word synonyms
I'm trying to use multi-word synonyms. For example in my synonyms file I have nhl, national hockey league. If I do this index only, a search for nhl returns a correct match, but highlights the first word only, national. Ideally, it would highlight national hockey league or not highlight at all. If I do the synonyms at both index and query time, it finds the match and does the correct highlighting, but I understand it is not ideal to do synonyms at index and query time. I am expanding synonyms and using edismax. Thoughts?
Re: how to make effective search with fq and q params
I think you're using dismax, not edismax. edismax will take q=*:* just fine as it handles all Lucene syntax queries also. dismax does not. So, if you're using dismax and there is no actual query (but you want to get facets), you set q.alt=*:* and omit q - that's entirely by design. If there's a non-empty q parameter, q.alt is not considered so there shouldn't be any issues with always have q.alt set if that's what you want. Erik On Nov 22, 2011, at 11:15 , Jeff Schmidt wrote: Hi Erik: It's not in the SolrJ library, but rather my use of it: In my application code: protected static final String SOLR_ALL_DOCS_QUERY = *:*; /* * If no search terms provided, then return all neighbors. * Results are to be returned in neighbor symbol alphabetical order. */ if (searchTerms == null) { searchTerms = SOLR_ALL_DOCS_QUERY; nodeQuery.addSortField(n_name, SolrQuery.ORDER.asc); } So, if no user search terms are provided, I search all documents (there are other fqs in effect) and return them in name order. That worked just fine. Then I read more about [e]dismax, and went and configured: str name=q.alt*:*/str Then I would get zero results. It's not a SolrJ issue though, as this request in my browser also resulted in zero results: http://localhost:8091/solr/ing-content/select/?qt=partner-tmofq=type%3Anodefq=n_neighborof_id%3AING\:afaq=*:*rows=5facet=truefacet.mincount=1facet.field=n_neighborof_processExactfacet.field=n_neighborof_edge_type That was due to the q=*:*. Once I set, say, q=cancer, I got results. So I guess this is a [e]dismax thing? (partner-tmo is the name of my request handler). I solved my problem by net setting *:* in my application, and left q.alt=*:* in place. Hope this helps. Again, this is stock Solr 3.4.0, running the Apache war under Tomcat 6. Jeff On Nov 22, 2011, at 8:05 AM, Erik Hatcher wrote: On Nov 22, 2011, at 09:55 , Jeff Schmidt wrote: When using [e]dismax, does configuring q.alt=*:* and not specifying q affect the performance/caching in any way? No different than using q=*:* with the lucene query parser. MatchAllDocsQuery is possibly the fastest query out there! (it simply matches documents in index order, all scores are 1.0) As a side note, a while back I configured q.alt=*:*, and the application (via SolrJ) still set q=*:* if no user input was provided (faceting). With both of them set that way, I got zero results. (Solr 3.4.0) Interesting. Ouch. Really? I don't see in the code (looking at my trunk checkout) where there's any *:* used in the SolrJ library. Can you provide some details on how you used SolrJ? It'd be good to track this down as that seems like a bug to me. Erik Thanks, Jeff On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote: If all you're doing is filtering (browsing by facets perhaps), it's perfectly fine to have q=*:*. MatchAllDocsQuery is fast (and would be cached anyway), so use *:* as appropriate without worries. Erik On Nov 22, 2011, at 07:18 , pravesh wrote: Usually, Use the 'q' parameter to search for the free text values entered by the users (where you might want to parse the query and/or apply boosting/phrase-sloppy, minimum match,tie etc ) Use the 'fq' to limit the searches to certain criterias like location, date-ranges etc. Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html Sent from the Solr - User mailing list archive at Nabble.com. -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068 -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068
Re: FunctionQuery score=0
: q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02 : title^0.08 categorysearch^0.05 boost='eqsim(alltokens,xyz)' : v='+tokens5:xyz '}) : : : With the above query, I am getting only the results that I want, the ones : whose score after my FucntionQuery are above 0, but the problem now is that : the final score for all results is changed to 1, which affects the sorting. : : How can I keep the original score that is calculated by the edismax query? a) Like i said. details matter. In your earlier messages you mentioned that you were wrapping a function arround a query and wanted to not have the function match anythign where the result was 0 -- the suggestions provided have done that. this is the first time you mentioned that you needed the values returned by the function as the scores of the documents (had you mentioned that you might have gotten differnet answers) b) if you look closely at the suggestion from André, you'll see that his specific suggestion will actually do what you want if you follow it -- express the query you want in the q param (so you get the scores from it) and then express an fq that refers to the q query as a variable... : q=ipodfq={!frange l=0 incl=false}query($q) c) Based on the concrete example you've given above, i'ts not clear to me that you actually need any of this -- if the above query is giving you the results you want, but you want the scores from the edismax query to be used as the final scores of the function, then there is no need to wrap the query in any sort of function at all, or exclude any 0 values this should be exactly what you want... q={!type=edismax qf=abstract^0.02 title^0.08 categorysearch^0.05 boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '} ...why exactly did you think you needed to wrap that query in a function? https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
spellcheck in dismax
I put the following into dismax requestHandler, but no suggestion field is returned. lst name=defaults str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr But everything works if I put it as a separate requestHandler. Did I miss something? Thanks Richard
Re: spellcheck in dismax
It seem you forget this str name=spellchecktrue/str -Original Message- From: Ruixiang Zhang rxzh...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Nov 22, 2011 11:54 am Subject: spellcheck in dismax I put the following into dismax requestHandler, but no suggestion field is returned. lst name=defaults str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr But everything works if I put it as a separate requestHandler. Did I miss something? Thanks Richard
Re: Faceting is not Using Field Value Cache . . ?
AFAIK, FieldValueCache is only used for faceting on tokenized fields. Maybe, are you getting confused with FieldCache ( http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/FieldCache.html)? This is used for common facets (using facet.method=fc and not tokenized fields). This makes any sense for you? On Tue, Nov 22, 2011 at 7:21 PM, CRB sub.scripti...@metaheuristica.comwrote: Seeing something odd going on with faceting . . . we execute facets with every query and yet the fieldValueCache is not being used: name: fieldValueCache class: org.apache.solr.search.**FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 0 hits : 0 hitratio : 0.00 inserts : 0 evictions : 0 size : 0 warmupTime : 0 cumulative_lookups : 0 cumulative_hits : 0 cumulative_hitratio : 0.00 cumulative_inserts : 0 cumulative_evictions : 0 I was under the impression the fieldValueCache was an implicit cache (if you don't define it, it will still exist). We are running Solr v3.3 (and NOT using {!cache=false}). Thoughts? -- Un saludo, Samuel García.
Re: Solr highlighting isn't work!
(11/11/22 22:30), VladislavLysov wrote: Hello!!! I have a trouble with Solr highlighting. I have any document with next fields- TYPE, DBID and others. When i do next request - https://localhost:8443/solr/myCore/afts?wt=standardq=TYPE: https://localhost:8443/solr/myCore/afts?wt=standardq=TYPE:cm:contentindent=onhl=truehl.fl=DBIDhl.usePhraseHighlighter=truefl=DBID it was returned next text: response lst name=responseHeader int name=status0/int int name=QTime3/int /lst result name=response numFound=166 start=0 doc arr name=DBID str892/str /arr /doc doc /result lst name=highlighting lst name=LEAF-892/ /lst /response What is problem? Thank you! What term are you trying to highlight? You queried cm:content on TYPE field and commanded to highlight the term on DBID field. But seems that DBID field includes only 892, highlighter cannot create any highlighted snippets. With Solr 3.5 (now RC2 available) or trunk version of Solr, you can use hl.q parameter for highlighting query. http://wiki.apache.org/solr/HighlightingParameters#hl.q koji -- Check out Query Log Visualizer for Apache Solr http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
Re: Solr real time update
Hi Nagarajayya, Thanks for your information. Do I need to change any configuration of my current solr server to integrate your plugin? Spark 2011/11/22 Nagendra Nagarajayya nnagaraja...@transaxtions.com Yu: To get Near Real Time update in Solr 1.4.1 you will need to use Solr 1.4.1 with RankingAlgorithm. This allows you to update documents in near real time. You can download and give this a try from here: http://solr-ra.tgels.org/ Regards, - Nagendra Nagarajayya http://solr-ra.tgels.org/ http://rankingalgorithm.tgels.**org/ http://rankingalgorithm.tgels.org/ On 11/21/2011 9:47 PM, yu shen wrote: Hi All, After some study, I used below snippet. Seems the documents is updated, while still takes a long time. Feels like the parameter does not take effect. Any comments? UpdateRequest req = new UpdateRequest(); req.add(solrDocs); req.setCommitWithin(5000); req.setParam(commitWithin, 5000); req.setAction(**AbstractUpdateRequest.ACTION.**COMMIT, true, true); req.process(SOLR_SERVER); 2011/11/22 yu shenshenyu...@gmail.com Hi All, I try to do a 'nearly real time update' to solr. My solr version is 1.4.1. I read this solr CommentWithinhttp://wiki.** apache.org/solr/CommitWithin http://wiki.apache.org/solr/CommitWithin **wiki, and a related threadhttp://lucene.472066.**n3.nabble.com/Solr-real-time-** update-taking-time-td3472709.**htmlhttp://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-td3472709.htmlmostly on the difficulty to do this. My issue is I tried the code snippet in the wiki: UpdateRequest req = new UpdateRequest(); req.add(mySolrInputDocument); req.setCommitWithin(1); req.process(server); But my index did not get updated, unless I call SOLR_SERVER.commit(); explicitly. The latter call will take more than 1 minute on average to return. Can I do a real time update on solr 1.4.1? Would someone help to show a workable code snippet? Spark
If search matches index in the middle of filter chain, will result return?
Hi all I am using Solr 3.4 with Win7 and Jetty. When I do a search on a field, according to the Analysis from Solr, the search string matches the index in the middle of the chain. Here is the schema: fieldType name=substring_search class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=../../filters/filter-mappings.txt/ charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.CommonGramsFilterFactory words=../../filters/stopwords.txt ignoreCase=true/ filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=../../filters/filter-mappings.txt/ charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType I am searching for an email called: off...@officeofficeoffice.com. If I search any text under 20 characters, result will be returned. But when I search the whole string: off...@officeofficeoffice.com, no result return. As you all see in the schema in index part, when I search the whole string, it will match the index chain before NGramFilterFactory. But after NGram, no result found. Here are my questions: - Is this behavior normal? - In order to get off...@officeofficeoffice.com, does it mean that I have to make the maxGramSize larger (like 70)? Thank you in advance for all your support. This is a great community.
Separate ACL and document index
Hi there, Is it possible to separate ACL index and document index and achieve to search by user role in SOLR? Currently my implementation is to index ACL with document, but the document itself change frequently. I have to perform rebuild index every time when ACL change. It's heavy for whole system due to document are so many and content are huge. Do you guys have any solution to solve this problem. I've been read mailing list for a while. Seem there is not suitable solution for me. I want user searches result only for him according to his role but I don't want to re-index document every time when document's ACL change. To my knowledge, is this possible to perform a join like database to achieve this? How and possible? Thanks Floyd
Re: If search matches index in the middle of filter chain, will result return?
On 11/22/2011 7:54 PM, Ellery Leung wrote: I am searching for an email called: off...@officeofficeoffice.com. If I search any text under 20 characters, result will be returned. But when I search the whole string: off...@officeofficeoffice.com, no result return. As you all see in the schema in index part, when I search the whole string, it will match the index chain before NGramFilterFactory. But after NGram, no result found. Here are my questions: - Is this behavior normal? I'm pretty sure that your query must match after the entire analyzer chain is done. I would expect that behavior to be normal. - In order to get off...@officeofficeoffice.com, does it mean that I have to make the maxGramSize larger (like 70)? If you were to increase the maxGramSize to 70, you would get a match in this case, but your index might get a lot larger, depending on what's in your source data. That's probably not the right approach, though. In general, you want to have your index and query analyzer chains exactly the same. There are some exceptions, but I don't think the NGram filter is one of them. The synonym filter and WordDelimiterFilter are examples where it is expected that your index and query analyzer chains will be different. Add the NGram and CommonGram filters to the query chain, and everything should start working. If you were to go with a single analyzer for both like the following, I think it would start working. You wouldn't even need to reindex, since you wouldn't be changing the index analyzer. fieldType name=substring_search class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.MappingCharFilterFactory mapping=../../filters/filter-mappings.txt/ charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.CommonGramsFilterFactory words=../../filters/stopwords.txt ignoreCase=true/ filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType Regarding your NGram filter, I would actually increase the minGramSize to at least 2 and decrease the maxGramSize to something like 10 or 15, then reindex. An additional note: CommonGrams may not be all that useful unless you are indexing large numbers of huge documents, like entire books. This particular fieldType is not suitable for full text anyway, since it uses KeywordTokenizer. Consider removing CommonGrams from this fieldType and reindexing. Unless you are dealing with large amounts of text, consider removing it from the entire schema. If you do remove it, it's usually not a good idea to replace it with a StopFilter. The index size reduction found in stopword removal is not usually worth the potential loss of recall. Be prepared to test all reasonable analyzer combinations, rather than taking my word for it. After reading the Hathi Trust blog, I tried CommonGrams on my own index. It actually made things slower, not faster. My typical document is only a few thousand bytes of metadata. The Hathi Trust is indexing millions of full-length books. Thanks, Shawn
Re: Solr real time update
Spark: Solr with RankingAlgorithm is not a plugin but a change of search library from Lucene to RankingAlgorithm. Here is more info on the changes you will need to make to your solrconfig.xml: http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search Regards, - Nagendra Nagrajayya http://solr-ra.tgels.org/ http://rankingalgorithm.tgels.org/ On 11/22/2011 5:40 PM, yu shen wrote: Hi Nagarajayya, Thanks for your information. Do I need to change any configuration of my current solr server to integrate your plugin? Spark 2011/11/22 Nagendra Nagarajayyannagaraja...@transaxtions.com Yu: To get Near Real Time update in Solr 1.4.1 you will need to use Solr 1.4.1 with RankingAlgorithm. This allows you to update documents in near real time. You can download and give this a try from here: http://solr-ra.tgels.org/ Regards, - Nagendra Nagarajayya http://solr-ra.tgels.org/ http://rankingalgorithm.tgels.**org/http://rankingalgorithm.tgels.org/ On 11/21/2011 9:47 PM, yu shen wrote: Hi All, After some study, I used below snippet. Seems the documents is updated, while still takes a long time. Feels like the parameter does not take effect. Any comments? UpdateRequest req = new UpdateRequest(); req.add(solrDocs); req.setCommitWithin(5000); req.setParam(commitWithin, 5000); req.setAction(**AbstractUpdateRequest.ACTION.**COMMIT, true, true); req.process(SOLR_SERVER); 2011/11/22 yu shenshenyu...@gmail.com Hi All, I try to do a 'nearly real time update' to solr. My solr version is 1.4.1. I read this solr CommentWithinhttp://wiki.** apache.org/solr/CommitWithinhttp://wiki.apache.org/solr/CommitWithin **wiki, and a related threadhttp://lucene.472066.**n3.nabble.com/Solr-real-time-** update-taking-time-td3472709.**htmlhttp://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-td3472709.htmlmostly on the difficulty to do this. My issue is I tried the code snippet in the wiki: UpdateRequest req = new UpdateRequest(); req.add(mySolrInputDocument); req.setCommitWithin(1); req.process(server); But my index did not get updated, unless I call SOLR_SERVER.commit(); explicitly. The latter call will take more than 1 minute on average to return. Can I do a real time update on solr 1.4.1? Would someone help to show a workable code snippet? Spark
RE: If search matches index in the middle of filter chain, will result return?
Thanks Shawn. So to recap: - Every match must be found after entire chain, not in the middle of the chain. - Suggested: index and query chain should be the same. In my situation, if I make both of them the same, the result may be misleading because it will also match other records that have the same partial string. But your suggestion is wonderful. Thank you very much. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: 2011年11月23日 12:04 下午 To: solr-user@lucene.apache.org Subject: Re: If search matches index in the middle of filter chain, will result return? On 11/22/2011 7:54 PM, Ellery Leung wrote: I am searching for an email called: off...@officeofficeoffice.com. If I search any text under 20 characters, result will be returned. But when I search the whole string: off...@officeofficeoffice.com, no result return. As you all see in the schema in index part, when I search the whole string, it will match the index chain before NGramFilterFactory. But after NGram, no result found. Here are my questions: - Is this behavior normal? I'm pretty sure that your query must match after the entire analyzer chain is done. I would expect that behavior to be normal. - In order to get off...@officeofficeoffice.com, does it mean that I have to make the maxGramSize larger (like 70)? If you were to increase the maxGramSize to 70, you would get a match in this case, but your index might get a lot larger, depending on what's in your source data. That's probably not the right approach, though. In general, you want to have your index and query analyzer chains exactly the same. There are some exceptions, but I don't think the NGram filter is one of them. The synonym filter and WordDelimiterFilter are examples where it is expected that your index and query analyzer chains will be different. Add the NGram and CommonGram filters to the query chain, and everything should start working. If you were to go with a single analyzer for both like the following, I think it would start working. You wouldn't even need to reindex, since you wouldn't be changing the index analyzer. fieldType name=substring_search class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.MappingCharFilterFactory mapping=../../filters/filter-mappings.txt/ charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.CommonGramsFilterFactory words=../../filters/stopwords.txt ignoreCase=true/ filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType Regarding your NGram filter, I would actually increase the minGramSize to at least 2 and decrease the maxGramSize to something like 10 or 15, then reindex. An additional note: CommonGrams may not be all that useful unless you are indexing large numbers of huge documents, like entire books. This particular fieldType is not suitable for full text anyway, since it uses KeywordTokenizer. Consider removing CommonGrams from this fieldType and reindexing. Unless you are dealing with large amounts of text, consider removing it from the entire schema. If you do remove it, it's usually not a good idea to replace it with a StopFilter. The index size reduction found in stopword removal is not usually worth the potential loss of recall. Be prepared to test all reasonable analyzer combinations, rather than taking my word for it. After reading the Hathi Trust blog, I tried CommonGrams on my own index. It actually made things slower, not faster. My typical document is only a few thousand bytes of metadata. The Hathi Trust is indexing millions of full-length books. Thanks, Shawn
Re: Integrating Surround Query Parser
How to apply this patch https://issues.apache.org/jira/browse/SOLR-2703 with solr 3.1 to install surround as plugin? On Tue, Nov 22, 2011 at 7:34 PM, Erik Hatcher erik.hatc...@gmail.comwrote: The surround query parser is fully wired into Solr trunk/4.0, if that helps. See http://wiki.apache.org/solr/SurroundQueryParser and the JIRA issue linked there in case you want to patch it into a different version. Erik On Jan 21, 2011, at 02:24 , Ahson Iqbal wrote: Hi All I want to integrate Surround Query Parser with solr, To do this i have downloaded jar file from the internet and and then pasting that jar file in web-inf/lib and configured query parser in solrconfig.xml as queryParser name=SurroundQParser class=org.apache.lucene.queryParser.surround.parser.QueryParser/ now when i load solr admin page following exception comes org.apache.solr.common.SolrException: Error Instantiating QParserPlugin, org.apache.lucene.queryParser.surround.parser.QueryParser is not a org.apache.solr.search.QParserPlugin what i think that i didnt get the right plugin, can any body guide me from where to get right plugin for surround query parser or how to accurately integrate this plugin with solr. thanx Ahsan -- Thanks Regards Rahul Mehta
Re: FunctionQuery score=0
Hi Hoss, Thanks for the detailed response. My XY problem is: 1) I am trying to search for a complex query: q={!type=edismax qf=abstract^0.02 title^0.08 categorysearch^0.05 boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '} Which answers my query needs. BUT, my boost function actually changes some of the results to be of score 0, which I want to be excluded from the result set. 2) This is why I used the frange query to solve the issue with the score 0: q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02 title^0.08 categorysearch^0.05 boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '}) But this time, the remaining results lost their *boosted* scores, and therefore the sort by score got all mixed up. 3) I assume I can use filter queries, but from my understanding FQs actually perform another query before the main one and these queries are expensive in time and I would like to avoid it if possible. Hope this explains a bit more. Thanks, Lev On Tue, Nov 22, 2011 at 9:15 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02 : title^0.08 categorysearch^0.05 boost='eqsim(alltokens,xyz)' : v='+tokens5:xyz '}) : : : With the above query, I am getting only the results that I want, the ones : whose score after my FucntionQuery are above 0, but the problem now is that : the final score for all results is changed to 1, which affects the sorting. : : How can I keep the original score that is calculated by the edismax query? a) Like i said. details matter. In your earlier messages you mentioned that you were wrapping a function arround a query and wanted to not have the function match anythign where the result was 0 -- the suggestions provided have done that. this is the first time you mentioned that you needed the values returned by the function as the scores of the documents (had you mentioned that you might have gotten differnet answers) b) if you look closely at the suggestion from André, you'll see that his specific suggestion will actually do what you want if you follow it -- express the query you want in the q param (so you get the scores from it) and then express an fq that refers to the q query as a variable... : q=ipodfq={!frange l=0 incl=false}query($q) c) Based on the concrete example you've given above, i'ts not clear to me that you actually need any of this -- if the above query is giving you the results you want, but you want the scores from the edismax query to be used as the final scores of the function, then there is no need to wrap the query in any sort of function at all, or exclude any 0 values this should be exactly what you want... q={!type=edismax qf=abstract^0.02 title^0.08 categorysearch^0.05 boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '} ...why exactly did you think you needed to wrap that query in a function? https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: Integrating Surround Query Parser
This what i tried: - Gone to the solr 3.1 directory which is downloaded from here. http://www.trieuvan.com/apache//lucene/solr/3.1.0/apache-solr-3.1.0.tgz - wget https://issues.apache.org/jira/secure/attachment/12493167/SOLR-2703.patch - run the : patch -p0 -i SOLR-2703.patch --dry-run - got an error : - patching file core/src/test/org/apache/solr/search/TestSurroundQueryParser.java - patching file core/src/test-files/solr/conf/schemasurround.xml - patching file core/src/test-files/solr/conf/solrconfigsurround.xml - patching file core/src/java/org/apache/solr/search/SurroundQParserPlugin.java - patching file example/solr/conf/solrconfig.xml - Hunk #1 FAILED at 1538. - 1 out of 1 hunk FAILED -- saving rejects to file example/solr/conf/solrconfig.xml.rej - our solr config file is getting end at 1508 only. - tried finding sudo find / -name TestSurroundQueryParser.java which is not found in the directory . - and when m doing svn up giving me Skipped '.' *Please suggest what should i do now ? * On Wed, Nov 23, 2011 at 10:39 AM, Rahul Mehta rahul23134...@gmail.comwrote: How to apply this patch https://issues.apache.org/jira/browse/SOLR-2703 with solr 3.1 to install surround as plugin? On Tue, Nov 22, 2011 at 7:34 PM, Erik Hatcher erik.hatc...@gmail.comwrote: The surround query parser is fully wired into Solr trunk/4.0, if that helps. See http://wiki.apache.org/solr/SurroundQueryParser and the JIRA issue linked there in case you want to patch it into a different version. Erik On Jan 21, 2011, at 02:24 , Ahson Iqbal wrote: Hi All I want to integrate Surround Query Parser with solr, To do this i have downloaded jar file from the internet and and then pasting that jar file in web-inf/lib and configured query parser in solrconfig.xml as queryParser name=SurroundQParser class=org.apache.lucene.queryParser.surround.parser.QueryParser/ now when i load solr admin page following exception comes org.apache.solr.common.SolrException: Error Instantiating QParserPlugin, org.apache.lucene.queryParser.surround.parser.QueryParser is not a org.apache.solr.search.QParserPlugin what i think that i didnt get the right plugin, can any body guide me from where to get right plugin for surround query parser or how to accurately integrate this plugin with solr. thanx Ahsan -- Thanks Regards Rahul Mehta -- Thanks Regards Rahul Mehta
Re: Can files be faceted based on their size ?
Thanks for replying I tried using Trie types for faceting solr but that did not solve the problem. If I use Trie types(for e.g. I used tlong)...it shows schema mismatch error as in FileListEntityProcessor api , fileSize has been defined of type string. That means we can not apply facet.range on fileSize. Am I right? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Can-files-be-faceted-based-on-their-size-tp3518393p3529923.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Performance/Architecture
Hi Shawn That was so great of you to explain the architecture in such a detail. I enjoyed reading it multiple times. I have a question here: You mentioned that we can use crc32(DocumentId)% NumServers. Now actually I am using that in my data-config.xml in the sql query itself, something like: For Documents to be indexed on Server 1: select DocumentId,PNum,... from Sample where crc32(DocumentId)%2=0; For Documents to be indexed on Server 2: select DocumentId,PNum,... from Sample where crc32(DocumentId)%2=1; Will that be a right way? Will it not be a slow query? Thanks once again. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Monday, November 21, 2011 7:47 PM To: solr-user@lucene.apache.org Subject: Re: Solr Performance/Architecture On 11/21/2011 12:41 AM, Husain, Yavar wrote: Number of rows in SQL Table (Indexed till now using Solr): 1 million Total Size of Data in the table: 4GB Total Index Size: 3.5 GB Total Number of Rows that I have to index: 20 Million (approximately 100 GB Data) and growing What is the best practices with respect to distributing the index? What I mean to say here is when should I distribute and what is the magic number that I can have for index size per instance? For 1 million itself Solr instance running on a VM is taking roughly 2.5 hrs to index for me. So for 20 million roughly it would take 60 -70 hrs. That would be too much. What would be the best distributed architecture for my case? It will be great if people may share their best practices and experience. I have a MySQL database with 66 million rows at the moment, always growing. My Solr index is split into six large shards and a small shard with the newest data. The small shard (incremental) is calculated by looking at counts of data in hourly increments between 7 and 3.5 days old, and either choosing a boundary that results in less than 500,000 documents or the 3.5 day boundary. This index is usually about 1GB in size. The rest of the documents are split between the other six shards using crc32(did) % 6. The did field is a mysql bigint autoincrement field. These large shards are very close to 11 million records and 20GB each. By indexing all six at once, I can complete a full index rebuild in about 3.5 hours. Each full index chain lives on two 64GB Dell servers with dual quad-core processors. Each server contains a Solr instance with 8GB of heap, running three large shards. One server contains the incremental index, the other server runs the load balancer. Both servers run an index-free Solr core that we call the broker. Its search handlers have the shards parameter in solrconfig.xml, pointed at the appropriate cores for that index chain. To keep index size down and search speed up, it's important that your index only contain the fields needed for two purposes: Searching (indexed fields) and displaying a results grid (stored fields). Any other information should be excluded from your schema.xml and/or DIH config. Full item details should be populated from the database or other information store (possibly a filesystem), using the unique identifier from the search results. If you are aggregating data from more than one table, see if you can have your database get the information into one SELECT statement with JOINs, rather than having more than one entity in your DIH config. Alternatively, if your secondary tables are small, try using the CachedSQLEntityProcessor on them so they are loaded entirely into RAM for the import. Your database software is usually much better at combining tables than Solr, so take advantage of it. If you have multivalued search fields from secondary entities in DIH, you can often get your database software to CONCAT them together into a single field, then use an appropriate tokenizer to split them into separate terms. I have one such field that is semicolon separated by a database JOIN that's specified in a view, then I use a pattern tokenizer that splits it at index time. I hope this is helpful. Thanks, Shawn ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter. Thank you.- ** FAFLD
Solr Search for misspelled search term
Hi all, I need to find a way by which solr check and return for results for misspelled search term. Do anybody have any idea? Thank You!! Meghana -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Search-for-misspelled-search-term-tp3529961p3529961.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr highlighting isn't work!
Thank you for request! I'm did next request and all is ok https://localhost:8443/solr/alfresco/select?wt=standardq=TYPE:%22{http://www.test.com/test/test/model/content/0.1}field%22indent=onhl=truehl.fl=TYPE https://localhost:8443/solr/alfresco/select?wt=standardq=TYPE:%22{http://www.test.com/test/test/model/content/0.1}field%22indent=onhl=truehl.fl=TYPE But now i have another problem. If i have field with name @{http://www.test.com/test/eln/model/content/0.1}label.__; and value {en}label1 and do request https://localhost:8443/solr/alfresco/select?q=@{http://www.test.com/test/test/model/content/0.1}label.__: https://localhost:8443/solr/alfresco/select?q=@{http://www.test.com/test/test/model/content/0.1}label.__:{en}label1wt=xml but it returned exception HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '@{http://www.agilent.com/openlab/eln/model/content/0.1}label.__:{en}label1;': Encountered } } at line 1, column 54. Was expecting one of: TO ... RANGEEX_QUOTED ... RANGEEX_GOOP ... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-highlighting-isn-t-work-tp3527701p3530016.html Sent from the Solr - User mailing list archive at Nabble.com.