Regarding Response Builder
The responsebuiilder class has SolrQueryRequest as public type. Using SolrQueryRequest we can get a list of SolrParams like SolrParams params = req.getParams(); Now I want to get the values of those params. What should be the approach as SolrParams is an abstract class and its get(String) method is abstract? Best regards, Amandeep Singh
Re: Boosting certain documents dynamically at query-time
On Sat, Jul 11, 2009 at 11:25 PM, Michael Lugassy mlu...@gmail.com wrote: Hi guys -- Using solr 1.4 functions at query-time, can I dynamically boost certain documents which are: a) not on the same range, i.e. have very different document ids, Yes. b) have different boost values, Yes. c) part of a long list (can be around 1,000 different document ids with 50 different boost values)? That will be one big query. You may run into maxBooleanClauses limit. I believe the default is 1024 clauses. Although the limit can be increased in solrconfig.xml, your queries may become too slow if you add so many clauses. -- Regards, Shalin Shekhar Mangar.
Does semi-colon still works as special character for sorting?
I read somewhere that it is deprecated
Re: Does semi-colon still works as special character for sorting?
Gargate, Siddharth wrote: I read somewhere that it is deprecated Yeah, as long as you explicitly use 'lucenePlusSort' parser via defType parameter: q=*:*;id descdefType=lucenePlusSort Koji
Re: Deleting index containg a perticular pattern in 'url' field
On Mon, Jul 13, 2009 at 6:34 AM, Beats tarun_agrawal...@yahoo.com wrote: HI, i m using nutch to crawl and solr to index the document. i want to delete the index containing a perticular word or pattern in url field. Is there something like Prune Index tool in solr? thanx in advance Beats be...@yahoo.com -- View this message in context: http://www.nabble.com/Deleting-index-containg-a-perticular-pattern-in-%27url%27-field-tp24459242p24459242.html Sent from the Solr - User mailing list archive at Nabble.com. You can delete by query and they query can contain wildcards. -- -- - Mark http://www.lucidimagination.com
Behaviour when we get more than 1 million hits
Hi, If while using Solr, what would the behaviour be like if we perform the search and we get more than one million hits Regards, Raakhi
Re: Does semi-colon still works as special character for sorting?
On Jul 13, 2009, at 4:58 AM, Gargate, Siddharth wrote: I read somewhere that it is deprecated see the 2nd paragraph in CHANGES.txt: http://svn.apache.org/repos/asf/lucene/solr/trunk/CHANGES.txt Erik
Re: Deleting index containg a perticular pattern in 'url' field
You can delete by query - deletequeryurl:some-word/query/delete Erik On Jul 13, 2009, at 6:34 AM, Beats wrote: HI, i m using nutch to crawl and solr to index the document. i want to delete the index containing a perticular word or pattern in url field. Is there something like Prune Index tool in solr? thanx in advance Beats be...@yahoo.com -- View this message in context: http://www.nabble.com/Deleting-index-containg-a-perticular-pattern-in-%27url%27-field-tp24459242p24459242.html Sent from the Solr - User mailing list archive at Nabble.com.
Solrj, tomcat and a proxy
Hello, I'm using SolrJ on a Tomcat environment with a proxy configured in the catalina.properties http.proxySet=true http.proxyPort=8080 http.proxyHost=XX.XX.XX.XX My CommonsHttpSolrServer does not seem to use the configured proxy, this results in a java.net.ConnectException: Connection refused error. How can I configure Java (jdk1.5.0_09), Tomcat (apache-tomcat-5.5.25) or SolrJ (apache-solr-solrj-1.3.0.jar) to use the proxy? Regards, Rene Schilperoort
Re: Behaviour when we get more than 1 million hits
It depends (tm) on what you try to do with the results. You really need togive us some more details on what you want to *do* with 1,000,000 hits before any meaningful response is possible. Best Erick On Mon, Jul 13, 2009 at 8:47 AM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, If while using Solr, what would the behaviour be like if we perform the search and we get more than one million hits Regards, Raakhi
Faceting
Hi, I'm in the process of making a javascriptless web interface to Solr (the nice ajax-version will be built on top of it unobtrusively). Our database has a lot of fields and so I've grouped those with similar characteristics to make several different 'widgets' (like a numerical type which get a min-max selector or an enumerated type with checkboxes) but I've run into a slight problem with fields which contain a lot of terms. One of those fields is country, what I'd like to do is display the top X countries, which is easily done with facet.field=countryf.country.facet.limit=X and display a more link which will redirect to a new page with all countries (and other query parameters in hidden fields) which posts back to the search page. All this is no problem, but once a person has selected some countries which are not in the top X (say 'Narnia' and 'Guilder') I want to list that country below the X top countries with a checked checkbox. Is there a good way to select the top X facets and include some terms you want to include as well something like facet.field=countryf.country.facet.limit=Xf.country.facet.includeterms=Narnia,Guilder or is there some other way to achieve this? Regards, Gijs Kunze
Re: Aggregating/Grouping Document Search Results on a Field
Thanks for this -- we're also trying out bobo-browse for Lucene, and early results look pretty enticing. They greatly sped up how fast you read in documents from disk, among other things: http://bobo-browse.wiki.sourceforge.net/ On Sat, Jul 11, 2009 at 12:10 AM, Shalin Shekhar Mangarshalinman...@gmail.com wrote: On Sat, Jul 11, 2009 at 12:01 AM, Bradford Stephens bradfordsteph...@gmail.com wrote: Does the facet aggregation take place on the Solr search server, or the Solr client? It's pretty slow for me -- on a machine with 8 cores/ 8 GB RAM, 50 million document index (about 36M unique values in the author field), a query that returns 131,000 hits takes about 20 seconds to calculate the top 50 authors. The query I'm running is this: http://dttest10:8983/solr/select/select?q=javafacet=truefacet.field=authorname : Is the author field tokenized? Is it multi-valued? It is best to have untokenized fields. Solr 1.4 has huge improvements in faceting performance so you can try that and see if it helps. See Yonik's blog post about this - http://yonik.wordpress.com/2008/11/25/solr-faceted-search-performance-improvements/ -- Regards, Shalin Shekhar Mangar. -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Re: Aggregating/Grouping Document Search Results on a Field
SOLR 1.4 has a new feature https://issues.apache.org/jira/browse/SOLR-475that speeds up faceting on fields with many terms by adding an UnInvertedField. Bobo uses a custom field cache as well. It may be useful to benchmark the 3 different approaches (bitsets, SOLR-475, Bobo). This could be a good wiki page explaining the differences between them? On Mon, Jul 13, 2009 at 9:49 AM, Bradford Stephens bradfordsteph...@gmail.com wrote: Thanks for this -- we're also trying out bobo-browse for Lucene, and early results look pretty enticing. They greatly sped up how fast you read in documents from disk, among other things: http://bobo-browse.wiki.sourceforge.net/ On Sat, Jul 11, 2009 at 12:10 AM, Shalin Shekhar Mangarshalinman...@gmail.com wrote: On Sat, Jul 11, 2009 at 12:01 AM, Bradford Stephens bradfordsteph...@gmail.com wrote: Does the facet aggregation take place on the Solr search server, or the Solr client? It's pretty slow for me -- on a machine with 8 cores/ 8 GB RAM, 50 million document index (about 36M unique values in the author field), a query that returns 131,000 hits takes about 20 seconds to calculate the top 50 authors. The query I'm running is this: http://dttest10:8983/solr/select/select?q=javafacet=truefacet.field=authorname : Is the author field tokenized? Is it multi-valued? It is best to have untokenized fields. Solr 1.4 has huge improvements in faceting performance so you can try that and see if it helps. See Yonik's blog post about this - http://yonik.wordpress.com/2008/11/25/solr-faceted-search-performance-improvements/ -- Regards, Shalin Shekhar Mangar. -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Get TermVectors for query hits only
Hi all, When I'm using the TermVectorComponent I receive term vectors with all tokens in the documents that meet my search criteria. I would be interested in getting the offsets for just those terms in the documents that meet the search citeria. My documents are about 200 K and are in XML. If I have just the offsets for the hits, I can easily implement my own highligting on the client side. Does anyone know how to go about doing this?
Are subqueries possible in Solr? If so, are they performant?
Does Solr have the ability to do subqueries, like this one (in SQL): SELECT id, first_name FROM student_details WHERE first_name IN (SELECT first_name FROM student_details WHERE subject= 'Science'); If so, how performant is this kind of queries? -- View this message in context: http://www.nabble.com/Are-subqueries-possible-in-Solr--If-so%2C-are-they-performant--tp24467023p24467023.html Sent from the Solr - User mailing list archive at Nabble.com.
Improve indexing time
Hi, We have a solr index of size 626 MB and number of douments indexed are 141810. We have configured index based spellchecker with buildOnCommit option set to true. Spellcheck index is of size 8.67 MB. We use data import handler to create the index from scratch and also to update the index periodically. We have created the job to run full import once every week and the delta import after every 20 mins. The full import takes about 38 mins to complete and the delta import takes about 12 mins to complete. The index also serves the search queries (even at the time the delta import is running). The number of documents that are changed during every delta import are on an average 25 to 30. Is there a way to reduce the amount of time delta import takes to update the index. The system specs are MS Windows Server 2003 R2 Standard x64 Edition 8 GB RAM. Solr is set up on Tomcat 6.0 The CPU utilization of the tomcat.exe at the time of delta import is 60%. In the data-config.xml file there are 6 root entities for 6 database tables under the Document element. The first root entity gets the rows from table1, the 2nd root entity gets the rows from table2 ...so on. The root entities have several child entities to get the fields from associated tables. The mergeFactor is set to 10 and ramBufferSizeMB is set to 32. The following is the cache setting filterCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ documentCache class=solr.LRUCache size=16384 initialSize=16384 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoading Is it advisable to use master slave configuration. Does the index size of 626 MB validate the change from existing single solr core (on which delta import is done after every 20 mins and also serves search queries) to master slave configuration keeping into consideration that the index size will keep on increasing over time. Is there any other way to improve the indexing time. Thanks, Gurjot **
Re: Faceting
On Mon, Jul 13, 2009 at 7:56 PM, gwk g...@eyefi.nl wrote: Is there a good way to select the top X facets and include some terms you want to include as well something like facet.field=countryf.country.facet.limit=Xf.country.facet.includeterms=Narnia,Guilder or is there some other way to achieve this? You can use facet.query for each of the terms you want to include. You may need to remove such terms from appearing in the facet.field=country results in the client. e.g. facet.field=countryf.country.facet.limit=Xfacet.query=country:Narniafacet.query=country:Guilder -- Regards, Shalin Shekhar Mangar.
Re: Select tika output for extract-only?
Ok, thanks. I played with it enough to to get plain text out at least, but I'll wait for the resolution of SOLR-284 -Peter On Sun, Jul 12, 2009 at 9:20 AM, Yonik Seeleyyo...@lucidimagination.com wrote: Peter, I'm hacking up solr cell right now, trying to simplify the parameters and fix some bugs (see SOLR-284) A quick patch to specify the output format should make it into 1.4 - but you may want to wait until I finish. -Yonik http://www.lucidimagination.com On Sat, Jul 11, 2009 at 5:39 PM, Peter Wolaninpeter.wola...@acquia.com wrote: I had been assuming that I could choose among possible tika output formats when using the extracting request handler in extract-only mode as if from the CLI with the tika jar: -x or --xml Output XHTML content (default) -h or --html Output HTML content -t or --text Output plain text content -m or --metadata Output only metadata However, looking at the docs and source, it seems that only the xml option is available (hard-coded) in ExtractingDocumentLoader: serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, true)); In addition, it seems that the metadata is always appended to the response. Are there any open issues relating to this, or opinions on whether adding additional flexibility to the response format would be of interest for 1.4? Thanks, Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Get TermVectors for query hits only
I seem to recall that the Highlighter in Solr is pluggable, so you may want to work at that level instead of the client side. Otherwise, you likely would have to implement your own TermVectorMapper and add that to the TermVectorComponent capability which then feeds your client. For an example of using TermVectorMapper, but not solving exactly your problem (but close), see http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/ but note that is at the Lucene level. On Jul 13, 2009, at 2:37 PM, Walter Ravenek wrote: Hi all, When I'm using the TermVectorComponent I receive term vectors with all tokens in the documents that meet my search criteria. I would be interested in getting the offsets for just those terms in the documents that meet the search citeria. My documents are about 200 K and are in XML. If I have just the offsets for the hits, I can easily implement my own highligting on the client side. Does anyone know how to go about doing this? -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
lucene or Solr bug with dismax?
I have been getting exceptions thrown when users try to send boolean queries into the dismax handler. In particular, with a leading 'OR'. I'm really not sure why this happens - I thought the dsimax parser ignored AND/OR? I'm using rev 779609 in case there were recent changes to this. Is this a known issue? Jul 13, 2009 1:47:06 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse 'OR vti OR bin OR vti OR aut OR author OR dll': Encountered OR OR at line 1, column 0. Was expecting one of: NOT ... + ... - ... ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... TERM ... * ... at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:110) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Merge Policy
SolrIndexConfig accepts a mergePolicy class name, however how does one inject properties into it?
Implementing Solr for the first time
I am new to Solr and trying to get it set up to index files from a directory structure on a server. I have a few questions. 1.) Is there an application that will return the search results in a user friendly format? 2.) How do I move Solr from the example environment into a production environment? 3.) Will Solr search through multiple folders when indexing and if so can I specify which folders to index from? I have looked through the tutorial, the Docs, and the FAQ and am still having problems making sense of it. Kevin Miller Oklahoma Tax Commission Web Services
Re: lucene or Solr bug with dismax?
It doesn't ignore OR and AND, though it probably should. I think there is a JIRA issue for it somewhere. On Mon, Jul 13, 2009 at 4:10 PM, Peter Wolanin peter.wola...@acquia.comwrote: I can still generate this error with Solr built from svn trunk just now. http://localhost:8983/solr/select/?qt=dismaxq=OR+vti+OR+foo I'm doubly perplexed by this since 'or' is in the stopwords file. -Peter On Mon, Jul 13, 2009 at 3:15 PM, Peter Wolaninpeter.wola...@acquia.com wrote: I have been getting exceptions thrown when users try to send boolean queries into the dismax handler. In particular, with a leading 'OR'. I'm really not sure why this happens - I thought the dsimax parser ignored AND/OR? I'm using rev 779609 in case there were recent changes to this. Is this a known issue? Jul 13, 2009 1:47:06 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse 'OR vti OR bin OR vti OR aut OR author OR dll': Encountered OR OR at line 1, column 0. Was expecting one of: NOT ... + ... - ... ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... TERM ... * ... at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:110) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- -- - Mark http://www.lucidimagination.com
Re: Aggregating/Grouping Document Search Results on a Field
Hi Brad: We have since (Bobo) added some perf tests which allows you to do some benchmarking very quickly: http://code.google.com/p/bobo-browse/wiki/BoboPerformance Let me know if you need help setting up. -John On Mon, Jul 13, 2009 at 10:41 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: SOLR 1.4 has a new feature https://issues.apache.org/jira/browse/SOLR-475that speeds up faceting on fields with many terms by adding an UnInvertedField. Bobo uses a custom field cache as well. It may be useful to benchmark the 3 different approaches (bitsets, SOLR-475, Bobo). This could be a good wiki page explaining the differences between them? On Mon, Jul 13, 2009 at 9:49 AM, Bradford Stephens bradfordsteph...@gmail.com wrote: Thanks for this -- we're also trying out bobo-browse for Lucene, and early results look pretty enticing. They greatly sped up how fast you read in documents from disk, among other things: http://bobo-browse.wiki.sourceforge.net/ On Sat, Jul 11, 2009 at 12:10 AM, Shalin Shekhar Mangarshalinman...@gmail.com wrote: On Sat, Jul 11, 2009 at 12:01 AM, Bradford Stephens bradfordsteph...@gmail.com wrote: Does the facet aggregation take place on the Solr search server, or the Solr client? It's pretty slow for me -- on a machine with 8 cores/ 8 GB RAM, 50 million document index (about 36M unique values in the author field), a query that returns 131,000 hits takes about 20 seconds to calculate the top 50 authors. The query I'm running is this: http://dttest10:8983/solr/select/select?q=javafacet=truefacet.field=authorname : Is the author field tokenized? Is it multi-valued? It is best to have untokenized fields. Solr 1.4 has huge improvements in faceting performance so you can try that and see if it helps. See Yonik's blog post about this - http://yonik.wordpress.com/2008/11/25/solr-faceted-search-performance-improvements/ -- Regards, Shalin Shekhar Mangar. -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Re: lucene or Solr bug with dismax?
Indeed - I assumed that only the + and - characters had any special meaning when parsing dismax queries and that all other content would be treated just as keywords. That seems to be how it's described in the dismax documentation? Looks like this is a relevant issue (is there another)? https://issues.apache.org/jira/browse/SOLR-874 -Peter On Mon, Jul 13, 2009 at 4:12 PM, Mark Millermarkrmil...@gmail.com wrote: It doesn't ignore OR and AND, though it probably should. I think there is a JIRA issue for it somewhere. On Mon, Jul 13, 2009 at 4:10 PM, Peter Wolanin peter.wola...@acquia.comwrote: I can still generate this error with Solr built from svn trunk just now. http://localhost:8983/solr/select/?qt=dismaxq=OR+vti+OR+foo I'm doubly perplexed by this since 'or' is in the stopwords file. -Peter On Mon, Jul 13, 2009 at 3:15 PM, Peter Wolaninpeter.wola...@acquia.com wrote: I have been getting exceptions thrown when users try to send boolean queries into the dismax handler. In particular, with a leading 'OR'. I'm really not sure why this happens - I thought the dsimax parser ignored AND/OR? I'm using rev 779609 in case there were recent changes to this. Is this a known issue? Jul 13, 2009 1:47:06 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse 'OR vti OR bin OR vti OR aut OR author OR dll': Encountered OR OR at line 1, column 0. Was expecting one of: NOT ... + ... - ... ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... TERM ... * ... at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:110) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- -- - Mark http://www.lucidimagination.com -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Sharded Index Creation Magic?
Hello! I'm working with Solr-1.3.0 using a sharded index for distributed, aggregated search. I've successfully run through the example described in the DistributedSearch wiki page. I have built an index from a corpus of some 50mil documents in an HBase table and created 7 shards using the org.apache.hadoop.hbase.mapred.BuildTableIndex. I can deploy any one of these shards to a single Solr instance and happily search the index after tweaking the schema appropriately. However, when I search across all deployed shards using the shards= query parameter ( http://host00:8080/solr/select?shards=host00:8080/solr,host01:8080/solrq=body\%3A%3Aterm), I get a NullPointerException: java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:421) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:265) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:264) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) Debugging into the QueryComponent.mergeIds() method reveals the instance sreq.responses (line 356) contains one response for each shard specified, each with the number of results received by the independant queries. The problems begin down at line 370 because the SolrDocument instance has only a score field -- which proves problematic in the following line where the id is requested. The SolrDocument, only containing a score, lacks the designated ID field (from my schema) and thus the document cannot be added to the results queue. Because the example on the wiki works by loading the documents directly into Solr for indexing, I have come to the conclusion that there is some extra magic happening in this index generation process which my process lacks. Thanks for the help!
Re: Trying to run embedded server from unit test...but getting configuration error
I believe that constructor expects to find an alternate format solr config that specifies the cores, eg like the one you can find in example/multicore/solr.xml http://svn.apache.org/repos/asf/lucene/solr/trunk/example/multicore/solr.xml Looks like that error is not finding the root solr node, so likely your trying to use a regular solrconfig.xml format? -- -- - Mark http://www.lucidimagination.com On Mon, Jul 13, 2009 at 8:53 PM, Reuben Firmin reub...@benetech.org wrote: Hi, I'm setting up an embedded solr server from a unit test (the non-bolded lines are just moving test resources to a tmp directory which is acting as solor.home.) final File dir = FileUtils.createTmpSubdir(); *System.setProperty(solr.solr.home, dir.getAbsolutePath());* final File conf = new File(dir, conf); conf.mkdir(); final PathMatchingResourcePatternResolver pmrpr = new PathMatchingResourcePatternResolver(); final File c1 = pmrpr.getResource(classpath:schema.xml).getFile(); final File c2 = pmrpr.getResource(classpath:solrconfig.xml).getFile(); final File c3 = pmrpr.getResource(classpath:test_protwords.txt).getFile(); final File c4 = pmrpr.getResource(classpath:test_stopwords.txt).getFile(); final File c5 = pmrpr.getResource(classpath:test_synonyms.txt).getFile(); FileUtils.copyFileToDirectory(c1, conf); // NOTE! this lives in the top level dir FileUtils.copyFileToDirectory(c2, dir); copyAndRenameTestFile(c3, dir, protwords.txt, conf); copyAndRenameTestFile(c4, dir, stopwords.txt, conf); copyAndRenameTestFile(c5, dir, synonyms.txt, conf); *final CoreContainer.Initializer initializer = new CoreContainer.Initializer(); initializer.setSolrConfigFilename(solrconfig.xml); final CoreContainer coreContainer = initializer.initialize(); final EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); engine.setServer(server);* The problem with this is that CoreContainer trips over and dumps an exception to the log: javax.xml.transform.TransformerException: Unable to evaluate expression using this context at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:363) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:241) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:189) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:104) at org.bookshare.search.solr.SolrSearchEngineTest.setup(SolrSearchEngineTest.java:44) It appears to be trying to evaluate property, which doesn't exist in solrconfig.xml (which is pretty much the same as http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/solrconfig.xml ). Anybody see anything obviously wrong? If not, what else can I give you to help debug this? Thanks Reuben
Availability during merge
The wiki page for merging solr cores (http://wiki.apache.org/solr/MergingSolrIndexes) mentions that the cores being merged cannot be indexed to during the merge. What about the core being merged *to*? In terms of the example on the wiki page, I'm asking if core0 can add docs while core1 and core2 are being merged into it. Thanks, - Charlie
Re: Get TermVectors for query hits only
Thanks Grant, I think I get the idea. Grant Ingersoll wrote: I seem to recall that the Highlighter in Solr is pluggable, so you may want to work at that level instead of the client side. Otherwise, you likely would have to implement your own TermVectorMapper and add that to the TermVectorComponent capability which then feeds your client. For an example of using TermVectorMapper, but not solving exactly your problem (but close), see http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/ but note that is at the Lucene level. On Jul 13, 2009, at 2:37 PM, Walter Ravenek wrote: Hi all, When I'm using the TermVectorComponent I receive term vectors with all tokens in the documents that meet my search criteria. I would be interested in getting the offsets for just those terms in the documents that meet the search citeria. My documents are about 200 K and are in XML. If I have just the offsets for the hits, I can easily implement my own highligting on the client side. Does anyone know how to go about doing this? -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.387 / Virus Database: 270.13.12/2234 - Release Date: 07/12/09 17:56:00
Re: Caching per segmentReader?
Shall we create an issue for this so we can list out desirable features? On Sun, Jul 12, 2009 at 7:01 AM, Yonik Seeley ysee...@gmail.com wrote: On Sat, Jul 11, 2009 at 7:38 PM, Jason Rutherglenjason.rutherg...@gmail.com wrote: Are we planning on implementing caching (docsets, documents, results) per segment reader or is this something that's going to be in 1.4? Yes, I've been thinking about docsets and documents (perhaps not results) per segment. It won't make it in for 1.4 though. -Yonik http://www.lucidimagination.com
Re: Trying to run embedded server from unit test...but getting configuration error
Thanks. I should have googled first. I came across: http://www.nabble.com/EmbeddedSolrServer-API-usage-td19778623.html For reference, my code is now: final File dir = FileUtils.createTmpSubdir(); System.setProperty(solr.solr.home, dir.getAbsolutePath()); final File conf = new File(dir, conf); conf.mkdir(); final PathMatchingResourcePatternResolver pmrpr = new PathMatchingResourcePatternResolver(); final File c1 = pmrpr.getResource(classpath:test_protwords.txt).getFile(); final File c2 = pmrpr.getResource(classpath:test_stopwords.txt).getFile(); final File c3 = pmrpr.getResource(classpath:test_synonyms.txt).getFile(); final File c4 = pmrpr.getResource(classpath:test_elevate.xml).getFile(); copyAndRenameTestFile(c1, dir, protwords.txt, conf); copyAndRenameTestFile(c2, dir, stopwords.txt, conf); copyAndRenameTestFile(c3, dir, synonyms.txt, conf); copyAndRenameTestFile(c4, dir, elevate.xml, conf); final File config = pmrpr.getResource(classpath:solrconfig.xml).getFile(); final CoreContainer cc = new CoreContainer(); final SolrConfig sc = new SolrConfig(config.getAbsolutePath()); final CoreDescriptor cd = new CoreDescriptor(cc, core0, dir.getAbsolutePath()); final SolrCore core0 = cc.create(cd); cc.register(core0, core0, false); final EmbeddedSolrServer server = new EmbeddedSolrServer(cc, core0); Reuben On Mon, Jul 13, 2009 at 5:00 PM, Mark Miller markrmil...@gmail.com wrote: I believe that constructor expects to find an alternate format solr config that specifies the cores, eg like the one you can find in example/multicore/solr.xml http://svn.apache.org/repos/asf/lucene/solr/trunk/example/multicore/solr.xml Looks like that error is not finding the root solr node, so likely your trying to use a regular solrconfig.xml format? -- -- - Mark http://www.lucidimagination.com On Mon, Jul 13, 2009 at 8:53 PM, Reuben Firmin reub...@benetech.org wrote: Hi, I'm setting up an embedded solr server from a unit test (the non-bolded lines are just moving test resources to a tmp directory which is acting as solor.home.) final File dir = FileUtils.createTmpSubdir(); *System.setProperty(solr.solr.home, dir.getAbsolutePath());* final File conf = new File(dir, conf); conf.mkdir(); final PathMatchingResourcePatternResolver pmrpr = new PathMatchingResourcePatternResolver(); final File c1 = pmrpr.getResource(classpath:schema.xml).getFile(); final File c2 = pmrpr.getResource(classpath:solrconfig.xml).getFile(); final File c3 = pmrpr.getResource(classpath:test_protwords.txt).getFile(); final File c4 = pmrpr.getResource(classpath:test_stopwords.txt).getFile(); final File c5 = pmrpr.getResource(classpath:test_synonyms.txt).getFile(); FileUtils.copyFileToDirectory(c1, conf); // NOTE! this lives in the top level dir FileUtils.copyFileToDirectory(c2, dir); copyAndRenameTestFile(c3, dir, protwords.txt, conf); copyAndRenameTestFile(c4, dir, stopwords.txt, conf); copyAndRenameTestFile(c5, dir, synonyms.txt, conf); *final CoreContainer.Initializer initializer = new CoreContainer.Initializer(); initializer.setSolrConfigFilename(solrconfig.xml); final CoreContainer coreContainer = initializer.initialize(); final EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); engine.setServer(server);* The problem with this is that CoreContainer trips over and dumps an exception to the log: javax.xml.transform.TransformerException: Unable to evaluate expression using this context at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:363) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:241) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:189) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:104) at org.bookshare.search.solr.SolrSearchEngineTest.setup(SolrSearchEngineTest.java:44) It appears to be trying to evaluate property, which doesn't exist in solrconfig.xml (which is pretty much the same as http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/solrconfig.xml ). Anybody see anything obviously wrong? If not, what else can I give you to help debug this? Thanks Reuben
allowDocsOutOfOrder support?
Is there a way to set this in SOLR 1.3 using solrconfig? Otherwise one needs to instantiate a class that statically calls BooleanQuery.setAllowDocsOutOfOrder?
Spell checking: Is there a way to exclude words known to be wrong?
We're building a spell index from a field in our main index with the following configuration: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=spellcheckIndexDir./spellchecker/str str name=buildOnCommittrue/str /lst /searchComponent This works great and re-builds the spelling index on commits as expected. However, we know there are misspellings in the spell field of our main index. We could remove these from the spelling index using Luke, however they will be added again on commits. What we need is something similar to how the protwords.txt file is used. So that when we notice misspelled words such as beginnning being pulled from our main index we could add them to an exclusion file so they are not added to the spelling index again. Any tricks to make this possible? -Jay
Re: Spell checking: Is there a way to exclude words known to be wrong?
I don't think there is a way currently, but it might make a nice patch. Or you could just implement a custom SolrSpellChecker - both FileBasedSpellChecker and IndexBasedSpellChecker are actually like maybe 50 lines of code or less. It would be fairly quick to just plug a custom version in as a plugin. -- - Mark http://www.lucidimagination.com On Mon, Jul 13, 2009 at 8:27 PM, Jay Hill jayallenh...@gmail.com wrote: We're building a spell index from a field in our main index with the following configuration: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=spellcheckIndexDir./spellchecker/str str name=buildOnCommittrue/str /lst /searchComponent This works great and re-builds the spelling index on commits as expected. However, we know there are misspellings in the spell field of our main index. We could remove these from the spelling index using Luke, however they will be added again on commits. What we need is something similar to how the protwords.txt file is used. So that when we notice misspelled words such as beginnning being pulled from our main index we could add them to an exclusion file so they are not added to the spelling index again. Any tricks to make this possible? -Jay
Re: Improve indexing time
considering the fact that there are only 20 to 30 docs changed the indexing is not the bottleneck. Bottleneck is probably the db and the time taken for the query to run. Are there deltaQueries in the sub-entities? if you can create a 'VIEW' in DB to identify the delta it could be faster On Tue, Jul 14, 2009 at 12:13 AM, Gurjot Singhgurjot...@gmail.com wrote: Hi, We have a solr index of size 626 MB and number of douments indexed are 141810. We have configured index based spellchecker with buildOnCommit option set to true. Spellcheck index is of size 8.67 MB. We use data import handler to create the index from scratch and also to update the index periodically. We have created the job to run full import once every week and the delta import after every 20 mins. The full import takes about 38 mins to complete and the delta import takes about 12 mins to complete. The index also serves the search queries (even at the time the delta import is running). The number of documents that are changed during every delta import are on an average 25 to 30. Is there a way to reduce the amount of time delta import takes to update the index. The system specs are MS Windows Server 2003 R2 Standard x64 Edition 8 GB RAM. Solr is set up on Tomcat 6.0 The CPU utilization of the tomcat.exe at the time of delta import is 60%. In the data-config.xml file there are 6 root entities for 6 database tables under the Document element. The first root entity gets the rows from table1, the 2nd root entity gets the rows from table2 ...so on. The root entities have several child entities to get the fields from associated tables. The mergeFactor is set to 10 and ramBufferSizeMB is set to 32. The following is the cache setting filterCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ documentCache class=solr.LRUCache size=16384 initialSize=16384 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoading Is it advisable to use master slave configuration. Does the index size of 626 MB validate the change from existing single solr core (on which delta import is done after every 20 mins and also serves search queries) to master slave configuration keeping into consideration that the index size will keep on increasing over time. Is there any other way to improve the indexing time. Thanks, Gurjot ** -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Solr 1.4 Release Date
Any updates on this? Cheers. Gurjot Singh wrote: Hi, I am curious to know when is the scheduled/tentative release date of Solr 1.4. Thanks, Gurjot -- View this message in context: http://www.nabble.com/Solr-1.4-Release-Date-tp23260381p24473570.html Sent from the Solr - User mailing list archive at Nabble.com.