Re: Indexing HTML document
Thank you! That's even more I wanted to know. ;) Georg On Tue, Mar 2, 2010 at 10:05 PM, Walter Underwood wun...@wunderwood.orgwrote: You are in luck, because Avi Rappoport has just written a tutorial about how to do this. It is available from Lucid Imagination: http://www.lucidimagination.com/solutions/whitepapers/Indexing-Text-and-HTML-Files-with-Solr I've just started reviewing it, but knowing Avi, I expect it to be very helpful. wunder On Mar 2, 2010, at 8:28 AM, Siddhant Goel wrote: There is an HTML filter documented here, which might be of some help - http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory Control characters can be eliminated using code like this - http://bitbucket.org/cogtree/python-solr/src/tip/pythonsolr/pysolr.py#cl-449 On Tue, Mar 2, 2010 at 9:37 PM, György Frivolt gyorgy.friv...@gmail.com wrote: Hi, How to index properly HTML documents? All the documents are HTML, some containing charaters encodid like #x17E;#xED; ... Is there a character filter for filtering these codes? Is there a way to strip the HTML tags out? Does solr weight the terms in the document based on where they appear?.. words in headers (H1, H2,..) would be supposed to describe the document more then words in paragraphs. Thanks for help, Georg -- - Siddhant
Clustering from anlayzed text instead of raw input
I'm trying to use carrot2 (now I started with the workbench) and I can cluster any field, but, the text used for clustering is the original raw text, the one that was indexed, without any of the processing performed by the tokenizer or filters. So I get stop words. I also did shingles (after filtering by POS) and I can not cluster using these multiwords. So my question is about how to get in a query answer the indexed text instead of the original one, because if I set stored to false, then the search does not return the content of the field. Tahnks in advance Joan -- View this message in context: http://old.nabble.com/Clustering-from-anlayzed-text-instead-of-raw-input-tp27765780p27765780.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 2 Cores, 1 Table, 2 DataImporter -- Import at the same time ?
pleeease help me somebody =( :P stocki wrote: Hello again ;) i install tomcat5.5 on my debian server ... i use 2 cores and two different DIH with seperatet Index, one for the normal search-feature and the other core for the suggest-feature. but i cannot start both DIH with an import command at the same time. how it this possible ? thx -- View this message in context: http://old.nabble.com/2-Cores%2C-1-Table%2C-2-DataImporter---%3E-Import-at-the-same-time---tp27756255p27765825.html Sent from the Solr - User mailing list archive at Nabble.com.
error in sum function
the sum function or the map one are not parsed correctly, doing this sort, works as a charm... sort=score+desc,sum(Num,map(Num,0,2000,42000))+asc but sort=score+desc,sum(map(Num,0,2000,42000),Num)+asc gives the following exception SEVERE: org.apache.solr.common.SolrException: Must declare sort field or function at org.apache.solr.search.QueryParsing.processSort(QueryParsing.java:376) at org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:281) at org.apache.solr.search.QParser.getSort(QParser.java:217) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:86) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) you can test it in here using these two url's http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onfl=id,Num,scoresort=score%20asc,sum%28map%28Num,0,5000,42000%29,Num%29+ascq=+entities_org:%28%22Amena%22%29 http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onwt=phpfl=id,Num,scorerows=50sort=score+desc,sum%28Num,map%28Num,0,2000,42000%29%29+ascq=+entities_org:Amena -- View this message in context: http://old.nabble.com/error-in-sum-function-tp27765881p27765881.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Implementing hierarchical facet
you could always define 1 dynamicfield and encode the hierarchy level in the fieldname: dynamicField name=_loc_hier_* type=string stored=false indexed=true omitNorms=true/ using: facet=onfacet.field={!key=Location}_loc_hier_cityfq=_loc_hier_country:somecountryid ... adding cityarea later for instance would be as simple as: facet=onfacet.field={!key=Location}_loc_hier_cityareafq=_loc_hier_city:somecityid Cheers, Geert-Jan 2010/3/3 Andy angelf...@yahoo.com Thanks. I didn't know about the {!key=Location} trick. Thanks everyone for your help. From what I could gather, there're 3 approaches: 1) SOLR-64 Pros: - can have arbitrary levels of hierarchy without modifying schema Cons: - each combination of all the levels in the hierarchy will result in a separate filter cache. This number could be huge, which would lead to poor performance 2) SOLR-792 Pros: - each level of the hierarchy separately results in filter cache. Much smaller number of filter cache. Better performance. Cons: - Only 2 levels are supported 3) Separate fields for each hierarchy levels Pros: - same as SOLR-792. Good performance Cons: - can only handle a fixed number of levels in the hierarchy. Adding any levels beyond that requires schema modification Does that sound right? Option 3 is probably the best match for my use case. Is there any trick to make it able to deal with arbitrary number of levels? Thanks. --- On Tue, 3/2/10, Geert-Jan Brits gbr...@gmail.com wrote: From: Geert-Jan Brits gbr...@gmail.com Subject: Re: Implementing hierarchical facet To: solr-user@lucene.apache.org Date: Tuesday, March 2, 2010, 8:02 PM Using Solr 1.4: even less changes to the frontend: facet=onfacet.field={!key=Location}countryid ... facet=onfacet.field={!key=Location}cityidfq=countryid:somecountryid etc. will consistently render the resulting facet under the name Location . 2010/3/3 Geert-Jan Brits gbr...@gmail.com If it's a requirement to let Solr handle the facet-hierarchy please disregard this post, but an alternative would be to have your App control when to ask for which 'facet-level' (e.g: country, state, city) in the hierarchy. as follows, each doc has 3 seperate fields (indexed=true, stored=false): - countryid - stateid - cityid facet on country: facet=onfacet.field=countryid facet on state ( country selected. functionally you probably don't want to show states without the user having selected a country anyway) facet=onfacet.field=countryidfq=countryid:somecountryid facet on city (state selected, same functional analogy as above) facet=onfacet.field=cityidfq=stateid:somestateid or facet on city (countryselected, same functional analogy as above) facet=onfacet.field=cityidfq=countryid:somecountryid grab the resulting facat and drop it under Location pros: - reusing fq's (good performance, I've never used hierarchical facets, but would be surprised if it has a (major) speed increase to this method) - flexible (you get multiple hierarchies: country -- state -- city and country -- city) cons: - a little more application logic Hope that helps, Geert-Jan 2010/3/2 Andy angelf...@yahoo.com I read that a simple way to implement hierarchical facet is to concatenate strings with a separator. Something like level1level2level3 with as the separator. A problem with this approach is that the number of facet values will greatly increase. For example I have a facet Location with the hierarchy countrystatecity. Using the above approach every single city will lead to a separate facet value. With tens of thousands of cities in the world the response from Solr will be huge. And then on the client side I'd have to loop through all the facet values and combine those with the same country into a single value. Ideally Solr would be aware of the hierarchy structure and send back responses accordingly. So at level 1 Solr will send back facet values based on country (100 or so values). Level 2 the facet values will be based on the states within the selected country (a few dozen values). Next level will be cities within that state. and so on. Is it possible to implement hierarchical facet this way using Solr?
Re: error in sum function
Can you try it latest trunk? I have just fixed it in a couple of days Koji Sekiguchi from mobile On 2010/03/03, at 18:18, JCodina joan.cod...@barcelonamedia.org wrote: the sum function or the map one are not parsed correctly, doing this sort, works as a charm... sort=score+desc,sum(Num,map(Num,0,2000,42000))+asc but sort=score+desc,sum(map(Num,0,2000,42000),Num)+asc gives the following exception SEVERE: org.apache.solr.common.SolrException: Must declare sort field or function at org.apache.solr.search.QueryParsing.processSort(QueryParsing.java:376) at org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:281) at org.apache.solr.search.QParser.getSort(QParser.java:217) at org.apache.solr.handler.component.QueryComponent.prepare (QueryComponent.java:86) at org.apache.solr.handler.component.SearchHandler.handleRequestBody (SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest (RequestHandlerBase.java:131) you can test it in here using these two url's http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onfl=id,Num,scoresort=score%20asc,sum%28map%28Num,0,5000,42000%29,Num%29+ascq=+entities_org:%28%22Amena%22%29 http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onwt=phpfl=id,Num,scorerows=50sort=score+desc,sum%28Num,map%28Num,0,2000,42000%29%29+ascq=+entities_org:Amena -- View this message in context: http://old.nabble.com/error-in-sum-function-tp27765881p27765881.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr with Tika - Text ordering garbled.
We are loading PDF documents with OCR contentl ayer into Solr through Tika. The load process appears to work fine and all of the words from the OCR layer are stored as Text in Solr, and therfore searchable. Our problem is that in the results returned from a search the words in the 'Text' field are not returned in the same order as those in the original OCR content in the PDF. This means that the snippet does not accurately reflect the original document content. It appears that sections of text from the OCR are ordered randomly, so a section from the bottom of the document appears alongside text from the top of the dcument. Additionally Tika strips out Carraige Return characters, but does not replace then with anything so terms in separate paragraphs get joined together. Any help welcomed. -- View this message in context: http://old.nabble.com/Solr-with-Tika---Text-ordering-garbled.-tp27766815p27766815.html Sent from the Solr - User mailing list archive at Nabble.com.
Error on startup
Hi All. I have shutdown solr removed the index so I can start over then re-launched. I am getting an error of SEVERE: REFCOUNT ERROR: unreferenced org.apache.solr.solrc...@14db38a4 (core1) has a reference count of 1 Any idea on what this is a result of ? Hope you can advise. Lee
Problems with variable geo_distance
Hi, I am having a very strange problem, related to local solr. In my documents there is a record for location called Gujranwala which is a city in Pakistan. I try to get search results with respect to coordinates of Lahore (another city of Pakistan). When I do a search within 100 miles, there are no results. When I do a search of 200 miles, it gives me Gujranwala in the end results. However the problem over here is that the geo_distance it gives is 48.112120348665925. This result should had been in the search within 100 miles since the geo_distance is 48.112. Here is the query that I was making: http://localhost:8983/solr/select/?q=+title:*qt=geolat=31.4845long=74.3216radius=100 http://localhost:8983/solr/select/?q=+title:*qt=geolat=31.4845long=74.3216radius=200 The coordinates of Gujranwala is : double name=latitude32.168652/double double name=longitude74.173981/double I would appreciate any help on this. Thanks -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/
[ANN] Carrot2 3.2.0 released
Dear All, I'm happy to announce three releases from the Carrot Search team: Carrot2 v3.2.0, Lingo3G v1.3.1 and Carrot Search Labs. Carrot2 is an open source search results clustering engine. Version v3.2.0 introduces: * experimental support for clustering Korean and Arabic content, * a command-line batch processing application, * significant updates to the Flash-based cluster visualization. As of version 3.2.0, Carrot2 is free of LGPL-licensed dependencies. Release notes: http://project.carrot2.org/release-3.2.0-notes.html Download: http://project.carrot2.org/download.html Lingo3G is a real-time document clustering engine from Carrot Search. Version 1.3.1 introduces support for clustering Arabic, Danish, Finnish, Hungarian, Korean, Romanian, Swedish and Turkish content, a command-line application and a number of minor improvements. Please contact us at i...@carrotsearch.com for details. Carrot Search Labs shares some small pieces of software we created when working on Carrot2 and Lingo3G. Please see http://labs.carrotsearch.com for details and downloads. Thanks! Dawid Weiss, Stanislaw Osinski Carrot Search, i...@carrot-search.com
Re-index after Solr config file changed without restarting services
Hi, I am attempting to achieve what I believe many others have attempted in the past: allow an end user to modify a Solr config file through a custom UI and then roll out any changes made without restarting any services. Specifically, I want to be able to let the user edit the synonyms.txt file and after committing the changes, force Solr to re-index based on those changes without restarting Tomcat. I have configured a Solr Master and Slave, each of which has a single core: * http://master:8080/solr/core * http://slave:8080/solr/core The cores are defined in respective solr.xml files as: solr persistent=true sharedLib=lib cores adminPath=/admin/cores core name=core instanceDir=core property name=configDir value=../../conf/ / /core /cores /solr Replication has been configured in the Master solrconfig.xml as follows: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAfterstartup/str str name=replicateAftercommit/str str name=snapshotstartup/str str name=snapshotcommit/str str name=confFilesschema.xml,${configDir}stopwords.txt,${configDir}elevate.xml,${configDir}synonyms.txt/str /lst /requestHandler and the Slave solrconfig.xml as: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrlhttp://master:8080/solr/core/replication/str str name=compressioninternal/str str name=httpConnTimeout5000/str str name=httpReadTimeout1/str str name=httpBasicAuthUserusername/str str name=httpBasicAuthPasswordpassword/str str name=pollInterval00:00:20/str /lst /requestHandler At service startup, replication works fine. However, when a change is made to the synonyms.txt file and http://master:8080/solr/admin/cores?action=RELOADcore=core is called neither the Master nor Slave are updated to reflect the modification. I am assuming that this is because in the Master schema.xml file the SynonymFilterFactory is being used at index time and the CoreAdmin RELOAD does not force a Solr re-index. If this is so, please can someone advise what the best methodology is to achieve what I am attempting? If not, please could someone let me know what I'm doing wrong?! Thanks, Marc
Re: Logging in Embedded SolrServer - What a nightmare.
Hello Kevin, No, haven't worked. I tried a lot of combinations between the jars of log4j, lsf4j and log4j-slf4j and got no success. As I said, for the solr.war, this you said seems to work, the same way I got it working confiuring jre/lib/logging.properties, but not with embedded server... Anyone can please help me? []s, Lucas Frare Teixeira .·. - lucas...@gmail.com - lucastex.com.br - blog.lucastex.com - twitter.com/lucastex On Tue, Mar 2, 2010 at 6:36 PM, Kevin Osborn osbo...@yahoo.com wrote: Not sure if it will solve your specific problem. We use Solr as a WAR as well as Solrj. So the main solr distribution comes with slf4j-jdk-1.5.5.jar. I just deleted that and replaced it with slf4j-log4j12-1.5.5.jar. And then it used my existing log4j.properties file. From: Lucas F. A. Teixeira lucas...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, March 2, 2010 11:14:26 AM Subject: Logging in Embedded SolrServer - What a nightmare. Hello all, I'm having a hard time trying to change Solr queries logging level. I've tried a lot of things I've found in the internet, this mailing list and solr docs. What I've found so far: - Solr Embedded Server uses sfl4j lib for intermediating logging. Here I'm using Log4j as my logging framework. - Changing the .../jre/lib/logging.properties worked, but only when querying using solr over http, and not on solr embedded. - A log4j.xml that I've added it is not being respected. (It is logging with a totally different layout and appenders) - I've searched for other log4j config files in the classpath, and found nothing... - Even tried to call Logger.getLogger(org.apache.solr) and then set its level manually inside the app, nothing changed... So, Embedded Solr Server keeps logging queries and other stuff in my stdout. Most docs and guides I've found in the internet is talking about solr http, this is ok for me, with http I got everything working, but not with solr embedded. Have anyone achieved this with embedded? Thanks a lot ppl, []s, Lucas Frare Teixeira .·. - lucas...@gmail.com - lucastex.com.br - blog.lucastex.com - twitter.com/lucastex
Re: Clustering from anlayzed text instead of raw input
Hi Joan, I'm trying to use carrot2 (now I started with the workbench) and I can cluster any field, but, the text used for clustering is the original raw text, the one that was indexed, without any of the processing performed by the tokenizer or filters. So I get stop words. The easiest way to fix this is to update the stop words list used by Carrot2, see http://wiki.apache.org/solr/ClusteringComponent, Tuning Carrot2 clustering section at the bottom. If you want to get readable cluster labels, it's best to feed the raw text for clustering (cluster labels are phrases taken from the input text, if you remove stopwords and stem everything, the phrases will become unreadable). Cheers, Staszek
need help with Solr Cores
Hi Everyone, I am new to Solr, and still trying to get my hands on it. I have indexed over 6 million documents and currently have a single large index. I update my index using SolrJ client due to the format I store my documents (i.e. JSON blobs) in database. I need to find a way to have multiple indexes for one solr instance. One for ongoing query search and one for updating index with new documents/schema etc. the idea is to switch between indexes while one is being updated. So that users could still search my index. I know Solr supports multiple cores and I have read wiki pages plus mailing lists on this, which help alot. However I am still confused with the need of having two separate indexes. I have solr.xml in solr.home dir with two dirs for each core and each core has conf folder copied from standard solr.home folder. Do i need data folder in each core's directory? and Should i copy/paste the index folder in each core's directory? Thanks for your help in advance!! -- View this message in context: http://old.nabble.com/need-help-with-Solr-Cores-tp27767694p27767694.html Sent from the Solr - User mailing list archive at Nabble.com.
Best performance for facet dates in trunk using solr.TrieDateField
Hey there, I am testing date facets in trunk with huge index. Aparently, as the default solrconfig.xml shows, the fastest way to run dace facets queries is index the field with this data type: !-- A Trie based date field for faster date range queries and date faceting. -- fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ I am wandering... setting precisionStep=8 to the TriedateField would improve even more the speed of the queries?? When using the TrieDateField doest still make sense to use the date rounds? For example: str name=facet.datedate/str str name=facet.date.start2006-06-01T00:00:00Z/MONTH/str str name=facet.date.end2010-01-30T23:59:59Z/MONTH/str str name=facet.date.gap+1MONTH/str Thanks in advance -- View this message in context: http://old.nabble.com/Best-performance-for-facet-dates-in-trunk-using-solr.TrieDateField-tp27767793p27767793.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 2 Cores, 1 Table, 2 DataImporter -- Import at the same time ?
what's the error you're getting? is DIH keeping some static that prevents it from running across two cores separately? if so, that'd be a bug. Erik On Mar 3, 2010, at 4:12 AM, stocki wrote: pleeease help me somebody =( :P stocki wrote: Hello again ;) i install tomcat5.5 on my debian server ... i use 2 cores and two different DIH with seperatet Index, one for the normal search-feature and the other core for the suggest-feature. but i cannot start both DIH with an import command at the same time. how it this possible ? thx -- View this message in context: http://old.nabble.com/2-Cores%2C-1-Table%2C-2-DataImporter---%3E-Import-at-the-same-time---tp27756255p27765825.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: error in sum function
Ok, solved!!! Joan Koji Sekiguchi-2 wrote: Can you try it latest trunk? I have just fixed it in a couple of days Koji Sekiguchi from mobile On 2010/03/03, at 18:18, JCodina joan.cod...@barcelonamedia.org wrote: the sum function or the map one are not parsed correctly, doing this sort, works as a charm... sort=score+desc,sum(Num,map(Num,0,2000,42000))+asc but sort=score+desc,sum(map(Num,0,2000,42000),Num)+asc gives the following exception SEVERE: org.apache.solr.common.SolrException: Must declare sort field or function at org.apache.solr.search.QueryParsing.processSort(QueryParsing.java:376) at org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:281) at org.apache.solr.search.QParser.getSort(QParser.java:217) at org.apache.solr.handler.component.QueryComponent.prepare (QueryComponent.java:86) at org.apache.solr.handler.component.SearchHandler.handleRequestBody (SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest (RequestHandlerBase.java:131) you can test it in here using these two url's http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onfl=id,Num,scoresort=score%20asc,sum%28map%28Num,0,5000,42000%29,Num%29+ascq=+entities_org:%28%22Amena%22%29 http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onwt=phpfl=id,Num,scorerows=50sort=score+desc,sum%28Num,map%28Num,0,2000,42000%29%29+ascq=+entities_org:Amena -- View this message in context: http://old.nabble.com/error-in-sum-function-tp27765881p27765881.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/error-in-sum-function-tp27765881p27768877.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issue on stopword list
Joe Calderon-2 wrote: or you can try the commongrams filter that combines tokens next to a stopword On Tue, Mar 2, 2010 at 6:56 AM, Walter Underwood wun...@wunderwood.org wrote: Don't remove stopwords if you want to search on them. --wunder On Mar 2, 2010, at 5:43 AM, Erick Erickson wrote: This is a classic problem with Stopword removal. Have you tried just removing stopwords from the indexing definition and the query definition and reindexing? You can't search on them no matter what you do if they've been removed, they just aren't there HTH Erick On Tue, Mar 2, 2010 at 5:47 AM, Suram reactive...@yahoo.com wrote: Hi, How can i search using stopword my query like this This - 0 results becuase it is a stopword is - 0 results becuase it is a stopword that - 0 results becuase it is a stopword if i search like This is that - it must give the result for that i need to change anything in my schema file to get result This is that -- View this message in context: http://old.nabble.com/Issue-on-stopword-list-tp27754434p27754434.html Sent from the Solr - User mailing list archive at Nabble.com. I tried commongrams also but won't worked . here search this is it .i would like to get exact information not for this is,is or it. my document like field name=id101/field field name=nameThis Is It/field field name=manuApache Software Foundation/field field name=catsoftware/field field name=catsearch/field Here my schema http://old.nabble.com/file/p27768959/schema.xml schema.xml and i set the specific field for searchable like Name ,manu, catwhen i index it not found search. -- View this message in context: http://old.nabble.com/Issue-on-stopword-list-tp27754434p27768959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 2 Cores, 1 Table, 2 DataImporter -- Import at the same time ?
okay i change the lockType to single but with no good effect. so i think now, that my two DIH are using the same data-Folder. why ist it so ? i thought that each DIH use his own index ... ?! i think it is not possible to import from one table parallel with more than one DIH`s ?! myexception: java.io.FileNotFoundException: /var/lib/tomcat5.5/temp/solr/data/index/_5d.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:78) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:108) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.init(NIOFSDirectory.java:94) at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:70) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:691) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:68) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:662) at org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:954) at org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:5190) at org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:4354) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:4192) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4183) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2647) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2601) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Erik Hatcher-4 wrote: what's the error you're getting? is DIH keeping some static that prevents it from running across two cores separately? if so, that'd be a bug. Erik On Mar 3, 2010, at 4:12 AM, stocki wrote: pleeease help me somebody =( :P stocki wrote: Hello again ;) i install tomcat5.5 on my debian server ... i use 2 cores and two different DIH with seperatet Index, one for the normal search-feature and the other core for the suggest-feature. but i cannot start both DIH with an import command at the same time. how it this possible ? thx -- View this message in context: http://old.nabble.com/2-Cores%2C-1-Table%2C-2-DataImporter---%3E-Import-at-the-same-time---tp27756255p27765825.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/SEVERE%3A-SolrIndexWriter-was-not-closed-prior-to-finalize%28%29%2C-indicates-a-bugPOSSIBLE-RESOURCE-LEAK%21%21%21-tp27756255p27768997.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Clustering from anlayzed text instead of raw input
Thanks Staszek I'll give a try to stopwords treatbment, but the problem is that we perform POS tagging and then use payloads to keep only Nouns and Adjectives, and we thought that could be interesting to perform clustering only with these elements, to avoid senseless words. Of course is a problem of clustering, but maybe is also a feature that could be interesting to have in solr: not to index the raw input text but the analyzed one, so stored could be False | Raw | analyzed Stanislaw Osinski-2 wrote: Hi Joan, I'm trying to use carrot2 (now I started with the workbench) and I can cluster any field, but, the text used for clustering is the original raw text, the one that was indexed, without any of the processing performed by the tokenizer or filters. So I get stop words. The easiest way to fix this is to update the stop words list used by Carrot2, see http://wiki.apache.org/solr/ClusteringComponent, Tuning Carrot2 clustering section at the bottom. If you want to get readable cluster labels, it's best to feed the raw text for clustering (cluster labels are phrases taken from the input text, if you remove stopwords and stem everything, the phrases will become unreadable). Cheers, Staszek -- View this message in context: http://old.nabble.com/Clustering-from-anlayzed-text-instead-of-raw-input-tp27765780p27769034.html Sent from the Solr - User mailing list archive at Nabble.com.
Can I used .XML files instead of .OSM files
I'm very new to Solr. I downloaded apache-solr-1.5-dev and was trying out the example in order to first figure out how Solr is working. I found out that the data directory consisted of .OSM files. But I have an XML file consisting of latitude, longitude and relevant news for that location. Can I just use the XML file to index the data or is it necessary for me to convert this file to .OSM file using some tool and then proceed further? Also the attribute value from the .OSM file is being considered in that example. Since there are no attributes for the tags in my XML file, how can I extract only the contents of my tags?Any help in this direction will be appreciated. Thanks in advance. -- View this message in context: http://old.nabble.com/Can-I-used-.XML-files-instead-of-.OSM-files-tp27769082p27769082.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help with Solr Cores
Figured it out !! I actually created two folders in solr.home/data folder, each holding the index for a given core. So for core0 and core1 i had indexes as: solr.home/data/core0/index solr.home/data/core1/index Feeling a little stupid now, having figured out a simple issue :s muneeb wrote: Hi Everyone, I am new to Solr, and still trying to get my hands on it. I have indexed over 6 million documents and currently have a single large index. I update my index using SolrJ client due to the format I store my documents (i.e. JSON blobs) in database. I need to find a way to have multiple indexes for one solr instance. One for ongoing query search and one for updating index with new documents/schema etc. the idea is to switch between indexes while one is being updated. So that users could still search my index. I know Solr supports multiple cores and I have read wiki pages plus mailing lists on this, which help alot. However I am still confused with the need of having two separate indexes. I have solr.xml in solr.home dir with two dirs for each core and each core has conf folder copied from standard solr.home folder. Do i need data folder in each core's directory? and Should i copy/paste the index folder in each core's directory? Thanks for your help in advance!! -- View this message in context: http://old.nabble.com/need-help-with-Solr-Cores-tp27767694p27769171.html Sent from the Solr - User mailing list archive at Nabble.com.
How to see the query generated by MoreLikeThisHandler?
Hello, Is there a way to see exactly what query is generated by the MoreLikeThisHandler? If I send debugQuery=true then I see in the response a key called parsedquery but it doesn't seem quite right. What I mean by that is when I make the MoreLikeThis query, I set mlt.fl to title,content but the query shown in parsedquery does not query on title at all... only on content. Furthermore, the query looks something like this content:word1 content:word2 content:word3 but if I copy and paste that into a standard query, nothing comes back because the default term operator is AND. If I change that query to content:word1 OR content:word2 OR content:word3, I get results but they are not the same as what the MLT query returns. Is there a way to see the generated query without actually running it? As of now, I'm making a MLT query with rows=0, but I think it's still running the query because it takes a non trivial amount of time and it also shows numFound in the response. Thanks for the help, -- Christopher
DisMaxRequestHandler questions about bf and bq
Hello, I have a couple of questions regarding the bf and bq params to the DisMaxRequestHandler. 1) Can I specify them more than once? Ex: bf=log(popularity)bf=log(comment_count) 2) When using bq, how can I specify what score to use for documents not returned by the query? In other words, how do I mimic this behavior using bq: bf=query($qq, 0.1)qq=site:news.yahoo.com Thanks for the help!
Formatting Results
Hey All I am indexing around 10,000 documents with Solar Cell which has gone superb. I can of course search the content like the example given: http://localhost:8983/solr/select?q=attr_content:tutorial But what I would like is for Solr to return the document with x many words and the matched content highlighted. I suppose a allot like google does. How can I achive such a result ? I know I can use the highlighting but cant seem to get this to work. Hope someone can put me on the right track. Thank you
RE: DIH onError question
Thanks for your prompt reply. I resolved the ERROR, and used continue to bypass any EXCEPTIONS. Nirmal Shah Remedy Consultant|Column Technologies|Cell: (630) 244-1648 -Original Message- From: Noble Paul നോബിള് नोब्ळ् [mailto:noble.p...@gmail.com] Sent: Tuesday, March 02, 2010 11:13 PM To: solr-user@lucene.apache.org Subject: Re: DIH onError question onError only handles Exception (not Error or Throwable). I your case it is a NoClassDefFoundError . If it is an Error or Throwable it is a symptom of a larger problem. If you fix the NoClassDefFoundError it should be ok On Wed, Mar 3, 2010 at 10:06 AM, Shah, Nirmal ns...@columnit.com wrote: Hi all, I am using Solr 1.5 from trunk. I am getting the below error on a full load, and it is causing the import to fail and rollback. I am not concerned about the error but rather that I cannot seem to tell the indexing to continue. I have two entities, and I have tried all (4) combinations of skip and continue for their onError attributes. SEVERE: Exception while processing: f document : null org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:652) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:606) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java :261) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18 5) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte r.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java :391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java: 372) Caused by: java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108 ) at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23 5) at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit yProcessor.java:124) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity ProcessorWrapper.java:233) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:580) ... 6 more Mar 2, 2010 10:21:05 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:652) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:606) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java :261) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18 5) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte r.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java :391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java: 372) Caused by: java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108 ) at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23 5) at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit yProcessor.java:124) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity ProcessorWrapper.java:233) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:580) ... 6 more Mar 2, 2010 10:21:05 PM
Re: Formatting Results
I'll give you an example about how to configure your default SearchHandler to do highlighting but I strongly recomend you to check properly the wiki. Everything is really well explained in there: http://wiki.apache.org/solr/HighlightingParameters str name=hltrue/str str name=hl.flattr_content/str str name=f.attr_content.hl.fragsize200/str str name=f.attr_content.hl.snippets1/str str name=f.attr_content.hl.alternateFieldf.attr_content/str str name=f.attr_content.hl.maxAlternateFieldLength300/str Lee Smith-6 wrote: Hey All I am indexing around 10,000 documents with Solar Cell which has gone superb. I can of course search the content like the example given: http://localhost:8983/solr/select?q=attr_content:tutorial But what I would like is for Solr to return the document with x many words and the matched content highlighted. I suppose a allot like google does. How can I achive such a result ? I know I can use the highlighting but cant seem to get this to work. Hope someone can put me on the right track. Thank you -- View this message in context: http://old.nabble.com/Formatting-Results-tp27771256p27772151.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR Index or database
Hello All, Just struggling with a thought where SOLR or a database would be good option for me.Here are my requirements. We index about 600+ news/blogs into out system. Only information we store locally is the title,link and article snippet.We are able to index all these sources into SOLR index and it works perfectly. This is where is gets tricky: We need to store certain meta information as well. e.g. 1. Rating/popularity of article 2. Sharing of the articles between users 3. How may times articles is viewed. 4. Comments on each article. So far, we are deciding to store meta-information in the database and link this data with the a document in the index. When user opens the page, results are combined from index and the database to render the view. Any reservation on using the above architecture? Is SOLR right fit in this case? We do need full text search so SOLR is no-brainer imho but would love to hear community view. Any feedback appreciated thanks -- View this message in context: http://old.nabble.com/SOLR-Index-or-database-tp27772362p27772362.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Error on startup
If you shut down the server propertly it's weird that you get an error when starting up again. How did you delete the index? I was experiencing something similar long time ago because I was removing the content from the index folder but not the folder itself. The correct way to do it was to remove the index folder and start up the server again (Solr creates the index folder if not present). I don't know if this has currently changed Lee Smith-6 wrote: Hi All. I have shutdown solr removed the index so I can start over then re-launched. I am getting an error of SEVERE: REFCOUNT ERROR: unreferenced org.apache.solr.solrc...@14db38a4 (core1) has a reference count of 1 Any idea on what this is a result of ? Hope you can advise. Lee -- View this message in context: http://old.nabble.com/Error-on-startup-tp27767018p27772394.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Formatting Results
Thanks Mark Ill have a good look at that part now. And I managed to get it started again :-). Thank you again Lee On 3 Mar 2010, at 18:52, Marc Sturlese wrote: I'll give you an example about how to configure your default SearchHandler to do highlighting but I strongly recomend you to check properly the wiki. Everything is really well explained in there: http://wiki.apache.org/solr/HighlightingParameters str name=hltrue/str str name=hl.flattr_content/str str name=f.attr_content.hl.fragsize200/str str name=f.attr_content.hl.snippets1/str str name=f.attr_content.hl.alternateFieldf.attr_content/str str name=f.attr_content.hl.maxAlternateFieldLength300/str Lee Smith-6 wrote: Hey All I am indexing around 10,000 documents with Solar Cell which has gone superb. I can of course search the content like the example given: http://localhost:8983/solr/select?q=attr_content:tutorial But what I would like is for Solr to return the document with x many words and the matched content highlighted. I suppose a allot like google does. How can I achive such a result ? I know I can use the highlighting but cant seem to get this to work. Hope someone can put me on the right track. Thank you -- View this message in context: http://old.nabble.com/Formatting-Results-tp27771256p27772151.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR Index or database
You need two, maybe three things that Solr doesn't do (or doesn't do well): * field updating * storing content * real time search and/or simple transactions I would seriously look at Mark Logic for that. It does all of those, plus full-text search, gracefully, plus it scales. There is also a version for Amazon EC2. www.marklogic.com Note: I work at Mark Logic, but I chose Solr for Netflix when I worked there. wunder On Mar 3, 2010, at 11:08 AM, caman wrote: Hello All, Just struggling with a thought where SOLR or a database would be good option for me.Here are my requirements. We index about 600+ news/blogs into out system. Only information we store locally is the title,link and article snippet.We are able to index all these sources into SOLR index and it works perfectly. This is where is gets tricky: We need to store certain meta information as well. e.g. 1. Rating/popularity of article 2. Sharing of the articles between users 3. How may times articles is viewed. 4. Comments on each article. So far, we are deciding to store meta-information in the database and link this data with the a document in the index. When user opens the page, results are combined from index and the database to render the view. Any reservation on using the above architecture? Is SOLR right fit in this case? We do need full text search so SOLR is no-brainer imho but would love to hear community view. Any feedback appreciated thanks
Re: Can I used .XML files instead of .OSM files
Are you sure you don't have a folder called exampledocs with xml files inside? These are the files to index as a first example: apache-solr-1.5-dev/example/exampledocs Check the /home/marc/Desktop/data/apache-solr-1.5-dev/example/solr/conf/schema.xml and solrconfig.xml and you will see how to configure them to be able to have your data indexed mamathahl wrote: I'm very new to Solr. I downloaded apache-solr-1.5-dev and was trying out the example in order to first figure out how Solr is working. I found out that the data directory consisted of .OSM files. But I have an XML file consisting of latitude, longitude and relevant news for that location. Can I just use the XML file to index the data or is it necessary for me to convert this file to .OSM file using some tool and then proceed further? Also the attribute value from the .OSM file is being considered in that example. Since there are no attributes for the tags in my XML file, how can I extract only the contents of my tags?Any help in this direction will be appreciated. Thanks in advance. -- View this message in context: http://old.nabble.com/Can-I-used-.XML-files-instead-of-.OSM-files-tp27769082p27772507.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need suggestion regarding custom transformer
I think you can handle that writing a custom transformer. There's a good explanation in the wiki: http://wiki.apache.org/solr/DIHCustomTransformer KshamaPai wrote: Hi, Am new to solr. I am trying location aware search with spatial lucene in solr1.5 nightly build. My table in mysql has just lat,lng and some text .I want to add geohash, lat_rad(lat in radian) and lng_rad field into the document before indexing. I have used dataimport to get my table to solr. I have to use GeohashUtils.Encode() to get geohash from corresponding lat,lng of each row; and *ToRads function to get lat in radians. Can i use custom transformers so that after retreiving each row , add these fields and then index while using dataimport? Or do i have to do data migration to xml and then do changes required before indexing? Thanks in advance. -- View this message in context: http://old.nabble.com/Need-suggestion-regarding-custom-transformer-tp27763576p27772561.html Sent from the Solr - User mailing list archive at Nabble.com.
Randomize MoreLikeThis
Hello. I'm implementing More Like This functionality in my search request. Everything works fine, but I need to randomize the return of this more like this query. Something like this: *First request:* Query - docId:528369 Results - fields ... More like This result name=528369 numFound=57162 start=0docstr name=docid1/str/docdocstr name=docid2/str/doc *Second request:* (same query, other resultset for more like this) Query - docId:528369 Results - fields ... More like This result name=528369 numFound=57162 start=0docstr name=docid3/str/docdocstr name=docid4/str/doc There is a way to do it? Thank's Então aproximaram-se os que estavam no barco, e adoraram-no, dizendo: És verdadeiramente o Filho de Deus. (Mateus 14:33)
Re: DisMaxRequestHandler questions about bf and bq
On Mar 3, 2010, at 12:26 PM, Christopher Bottaro wrote: I have a couple of questions regarding the bf and bq params to the DisMaxRequestHandler. 1) Can I specify them more than once? Ex: bf=log(popularity)bf=log(comment_count) Yes, you can use multiple bf parameters, each adding an optional clause to the actual query executed. 2) When using bq, how can I specify what score to use for documents not returned by the query? In other words, how do I mimic this behavior using bq: bf=query($qq, 0.1)qq=site:news.yahoo.com Why bother with bq in this situation? But I believe you could use bq={!func}query($qq, 0.1)qq=site:news.yahoo.com
Weird issue with solr and jconsole/jmx
Hi, I connected to one of my solr instances with Jconsole today and noticed that most of the mbeans under the solr hierarchy are missing. The only thing there was a Searcher, which I had no trouble seeing attributes for, but the rest of the statistics beans were missing. They all show up just fine on the stats.jsp page. In the past this always worked fine. I did have the core reload due to config file changes this morning. Could that have caused this?
Escaping options for tika/solr cell extract-only output
Looking at http://wiki.apache.org/solr/ExtractingRequestHandler: Extract Only the output includes XML generated by Tika (and is hence further escaped by Solr's XML) ...is there an option to NOT have the resulting TIKA output escaped? so lt;headgt; would come back as head/ If no, what would need to be done to enable this option? Looked into SOLR-1274.patch, but didn't see a parameter for such a thing. Thanks, Dan
Lucene: Finite-State Queries, Flexible Indexing, Scoring, and more
Hello folks, Those of you in or near New York and using Lucene or Solr should come to Lucene: Finite-State Queries, Flexible Indexing, Scoring, and more on March 24th: http://www.meetup.com/NYC-Search-and-Discovery/calendar/12720960/ The presenter will be the hyper active Lucene committer Robert Muir. Please spread the word. Otis -- Lucene ecosystem search :: http://search-lucene.com/
Re: Randomize MoreLikeThis
The first thing that came to mind is to index a random number with each doc and sort by that. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: André Maldonado andre.maldon...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, March 3, 2010 2:50:01 PM Subject: Randomize MoreLikeThis Hello. I'm implementing More Like This functionality in my search request. Everything works fine, but I need to randomize the return of this more like this query. Something like this: *First request:* Query - docId:528369 Results - fields ... More like This start=01 name=docid2 *Second request:* (same query, other resultset for more like this) Query - docId:528369 Results - fields ... More like This start=03 name=docid4 There is a way to do it? Thank's Então aproximaram-se os que estavam no barco, e adoraram-no, dizendo: És verdadeiramente o Filho de Deus. (Mateus 14:33)
Re: Re-index after Solr config file changed without restarting services
Marc, At least for the force Solr to reindex part, I think you'll need to index yourself. That is, you need to run whatever app you run when you (re)index the data normally. Solr won't automagically reindex the data. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Marc Wilson wo...@fancydressoutfitters.co.uk To: Solr solr-user@lucene.apache.org Sent: Wed, March 3, 2010 6:51:17 AM Subject: Re-index after Solr config file changed without restarting services Hi, I am attempting to achieve what I believe many others have attempted in the past: allow an end user to modify a Solr config file through a custom UI and then roll out any changes made without restarting any services. Specifically, I want to be able to let the user edit the synonyms.txt file and after committing the changes, force Solr to re-index based on those changes without restarting Tomcat. I have configured a Solr Master and Slave, each of which has a single core: *http://master:8080/solr/core *http://slave:8080/solr/core The cores are defined in respective solr.xml files as: Replication has been configured in the Master solrconfig.xml as follows: startup commit startup commit name=confFilesschema.xml,${configDir}stopwords.txt,${configDir}elevate.xml,${configDir}synonyms.txt and the Slave solrconfig.xml as: http://master:8080/solr/core/replication internal 5000 1 username password 00:00:20 At service startup, replication works fine. However, when a change is made to the synonyms.txt file and http://master:8080/solr/admin/cores?action=RELOADcore=core is called neither the Master nor Slave are updated to reflect the modification. I am assuming that this is because in the Master schema.xml file the SynonymFilterFactory is being used at index time and the CoreAdmin RELOAD does not force a Solr re-index. If this is so, please can someone advise what the best methodology is to achieve what I am attempting? If not, please could someone let me know what I'm doing wrong?! Thanks, Marc
Multi core Search is not working when used with SHARDS
Hi all, I am trying to search on multiple cores (distributed search) but not able to succeed using Shards. I am able to get the results when I am hitting each core seperately, http://localhost:8981/solr/core1/select/?q=test http://localhost:8981/solr/core0/select/?q=test but when I try to use distributed search using Shards as below http://localhost:8981/solr/core0/select?shards=localhost:8981/solr/core0,localhost:8981/solr/core1indent=trueq=test I am getting the below error, HTTP ERROR: 500 null java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.createMainQuery(QueryComponent.java:372) at org.apache.solr.handler.component.QueryComponent.distributedProcess(QueryComponent.java:292) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:234) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) RequestURI=/solr/core0/select Powered by Jetty:// Do I need to make any changes to make the Shards work? Thanks, Barani -- View this message in context: http://old.nabble.com/Multi-core-Search-is-not-working-when-used-with-SHARDS-tp27772726p27772726.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr query parsing
Why would fq=sdate:+20100110 parse via a Solr server but not via QueryParsing.parseQuery? Its choking on the + symbol in the sdate value. I'd use QParserPlugin however it requires passing a SolrQueryRequest, which is not kosher for testing, perhaps I'll need to bite the bullet and reproduce using QPP with an SQR.
Re: Multi core Search is not working when used with SHARDS
Hmmm, do you have a uniqueKey defined in your schemas? -Yonik http://www.lucidimagination.com On Wed, Mar 3, 2010 at 4:23 PM, JavaGuy84 bbar...@gmail.com wrote: Hi all, I am trying to search on multiple cores (distributed search) but not able to succeed using Shards. I am able to get the results when I am hitting each core seperately, http://localhost:8981/solr/core1/select/?q=test http://localhost:8981/solr/core0/select/?q=test but when I try to use distributed search using Shards as below http://localhost:8981/solr/core0/select?shards=localhost:8981/solr/core0,localhost:8981/solr/core1indent=trueq=test I am getting the below error, HTTP ERROR: 500 null java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.createMainQuery(QueryComponent.java:372) at org.apache.solr.handler.component.QueryComponent.distributedProcess(QueryComponent.java:292) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:234) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) RequestURI=/solr/core0/select Powered by Jetty:// Do I need to make any changes to make the Shards work? Thanks, Barani -- View this message in context: http://old.nabble.com/Multi-core-Search-is-not-working-when-used-with-SHARDS-tp27772726p27772726.html Sent from the Solr - User mailing list archive at Nabble.com.
Can Solr Create New Indexes?
Is there a setting in the config I can set to have Solr create a new Lucene index if the dataDir is empty on startup? I'd like to open our Solr system to allow other developers here to add new cores without having to use the Lucene API directly to create the indexes.
Re: Can Solr Create New Indexes?
On 03/03/2010 07:56 PM, Thomas Nguyen wrote: Is there a setting in the config I can set to have Solr create a new Lucene index if the dataDir is empty on startup? I'd like to open our Solr system to allow other developers here to add new cores without having to use the Lucene API directly to create the indexes. You don't have to use the Lucene API though? Solr creates the index if its not there ... -- - Mark http://www.lucidimagination.com
weighted search and index
Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set of keywords with weights: Query: Sports 0.8, golf 0.5 I am trying to find the closest matching contents for this query. My question is how to index the contents with weighted scores, and how to write search query. I was trying to use boosting, but seems not working right. Thanks. Jianbin
RE: Can Solr Create New Indexes?
Hmm I've tried starting Solr with no Lucene index in the dataDir. Here's the Exception I receive when starting Solr and when attempting to add a document to the core: 2010-03-03 16:44:06,479 [main] ERROR org.apache.solr.core.CoreContainer - java.lang.RuntimeException: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.simplefsdirect...@c:\ign\test-solr\objectIndex\i ndex: files: at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.init(SolrCore.java:579) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer. java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 83) at org.mortbay.jetty.servlet.FilterHolder.start(FilterHolder.java:71) at org.mortbay.jetty.servlet.WebApplicationHandler.initializeServlets(WebAp plicationHandler.java:310) at org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationCo ntext.java:509) at org.mortbay.jetty.plus.PlusWebAppContext.doStart(PlusWebAppContext.java: 149) at org.mortbay.util.Container.start(Container.java:72) at org.mortbay.http.HttpServer.doStart(HttpServer.java:708) at org.mortbay.jetty.plus.Server.doStart(Server.java:153) at org.mortbay.util.Container.start(Container.java:72) at org.mortbay.jetty.plus.Server.main(Server.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:151) at org.mortbay.start.Main.start(Main.java:476) at org.mortbay.start.Main.main(Main.java:94) Caused by: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.simplefsdirect...@c:\ign\test-solr\objectIndex\i ndex: files: at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.j ava:655) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) at org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at org.apache.lucene.index.IndexReader.open(IndexReader.java:403) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexR eaderFactory.java:38) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057) ... 21 more Before this point I've been using existing Lucene indexes (created by the Lucene API) with Solr without a problem. -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Wednesday, March 03, 2010 5:00 PM To: solr-user@lucene.apache.org Subject: Re: Can Solr Create New Indexes? On 03/03/2010 07:56 PM, Thomas Nguyen wrote: Is there a setting in the config I can set to have Solr create a new Lucene index if the dataDir is empty on startup? I'd like to open our Solr system to allow other developers here to add new cores without having to use the Lucene API directly to create the indexes. You don't have to use the Lucene API though? Solr creates the index if its not there ... -- - Mark http://www.lucidimagination.com
Re: Can Solr Create New Indexes?
I'm guessing the index folder itself already exists? The data dir can be there, but the index dir itself must not be - that's how it knows to create a new one. Otherwise it thinks the empty dir is the index and cant find the files it expects. On 03/03/2010 08:15 PM, Thomas Nguyen wrote: Hmm I've tried starting Solr with no Lucene index in the dataDir. Here's the Exception I receive when starting Solr and when attempting to add a document to the core: 2010-03-03 16:44:06,479 [main] ERROR org.apache.solr.core.CoreContainer - java.lang.RuntimeException: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.simplefsdirect...@c:\ign\test-solr\objectIndex\i ndex: files: at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.init(SolrCore.java:579) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer. java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 83) at org.mortbay.jetty.servlet.FilterHolder.start(FilterHolder.java:71) at org.mortbay.jetty.servlet.WebApplicationHandler.initializeServlets(WebAp plicationHandler.java:310) at org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationCo ntext.java:509) at org.mortbay.jetty.plus.PlusWebAppContext.doStart(PlusWebAppContext.java: 149) at org.mortbay.util.Container.start(Container.java:72) at org.mortbay.http.HttpServer.doStart(HttpServer.java:708) at org.mortbay.jetty.plus.Server.doStart(Server.java:153) at org.mortbay.util.Container.start(Container.java:72) at org.mortbay.jetty.plus.Server.main(Server.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:151) at org.mortbay.start.Main.start(Main.java:476) at org.mortbay.start.Main.main(Main.java:94) Caused by: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.simplefsdirect...@c:\ign\test-solr\objectIndex\i ndex: files: at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.j ava:655) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) at org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at org.apache.lucene.index.IndexReader.open(IndexReader.java:403) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexR eaderFactory.java:38) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057) ... 21 more Before this point I've been using existing Lucene indexes (created by the Lucene API) with Solr without a problem. -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Wednesday, March 03, 2010 5:00 PM To: solr-user@lucene.apache.org Subject: Re: Can Solr Create New Indexes? On 03/03/2010 07:56 PM, Thomas Nguyen wrote: Is there a setting in the config I can set to have Solr create a new Lucene index if the dataDir is empty on startup? I'd like to open our Solr system to allow other developers here to add new cores without having to use the Lucene API directly to create the indexes. You don't have to use the Lucene API though? Solr creates the index if its not there ... -- - Mark http://www.lucidimagination.com
RE: Can Solr Create New Indexes?
Ah that's the problem. Not sure why it didn't come to mind to follow the call stack. Thanks for your help! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Wednesday, March 03, 2010 5:20 PM To: solr-user@lucene.apache.org Subject: Re: Can Solr Create New Indexes? I'm guessing the index folder itself already exists? The data dir can be there, but the index dir itself must not be - that's how it knows to create a new one. Otherwise it thinks the empty dir is the index and cant find the files it expects. On 03/03/2010 08:15 PM, Thomas Nguyen wrote: Hmm I've tried starting Solr with no Lucene index in the dataDir. Here's the Exception I receive when starting Solr and when attempting to add a document to the core: 2010-03-03 16:44:06,479 [main] ERROR org.apache.solr.core.CoreContainer - java.lang.RuntimeException: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.simplefsdirect...@c:\ign\test-solr\objectIndex\i ndex: files: at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.init(SolrCore.java:579) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer. java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 83) at org.mortbay.jetty.servlet.FilterHolder.start(FilterHolder.java:71) at org.mortbay.jetty.servlet.WebApplicationHandler.initializeServlets(WebAp plicationHandler.java:310) at org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationCo ntext.java:509) at org.mortbay.jetty.plus.PlusWebAppContext.doStart(PlusWebAppContext.java: 149) at org.mortbay.util.Container.start(Container.java:72) at org.mortbay.http.HttpServer.doStart(HttpServer.java:708) at org.mortbay.jetty.plus.Server.doStart(Server.java:153) at org.mortbay.util.Container.start(Container.java:72) at org.mortbay.jetty.plus.Server.main(Server.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:151) at org.mortbay.start.Main.start(Main.java:476) at org.mortbay.start.Main.main(Main.java:94) Caused by: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.simplefsdirect...@c:\ign\test-solr\objectIndex\i ndex: files: at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.j ava:655) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) at org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at org.apache.lucene.index.IndexReader.open(IndexReader.java:403) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexR eaderFactory.java:38) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057) ... 21 more Before this point I've been using existing Lucene indexes (created by the Lucene API) with Solr without a problem. -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Wednesday, March 03, 2010 5:00 PM To: solr-user@lucene.apache.org Subject: Re: Can Solr Create New Indexes? On 03/03/2010 07:56 PM, Thomas Nguyen wrote: Is there a setting in the config I can set to have Solr create a new Lucene index if the dataDir is empty on startup? I'd like to open our Solr system to allow other developers here to add new cores without having to use the Lucene API directly to create the indexes. You don't have to use the Lucene API though? Solr creates the index if its not there ... -- - Mark http://www.lucidimagination.com
Re: weighted search and index
You have to provide some more details to get meaningful help. You say I was trying to use boosting. How? At index time? Search time? Both? Can you provide some code snippets? What does your schema look like for the relevant field(s)? You say but seems not working right. What does that mean? No hits? Hits not ordered as you expect? Have you tried putting debugQuery=on on your URL and examined the return values? Have you looked at your index with the admin page and/or Luke to see if the data in the index is as you expect? As far as I know, boosts are multiplicative. So boosting by a value less than 1 will actually decrease the ranking. But see the Lucene scoring, See: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html And remember, that boosting will *tend* to move a hit up or down in the ranking, not position it absolutely. HTH Erick On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote: Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set of keywords with weights: Query: Sports 0.8, golf 0.5 I am trying to find the closest matching contents for this query. My question is how to index the contents with weighted scores, and how to write search query. I was trying to use boosting, but seems not working right. Thanks. Jianbin
RE: weighted search and index
Thank you very much Erick! 1. I used boost in search, but I don't know exactly what's the best way to boost, for such as Sports 0.8, golf 0.5 in my example, would it be sports^0.8 AND golf^0.5 ? 2. I cannot use boost in indexing. Because the weight of the value changes, not the field, look at this example again, C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 There is no good way to boost it during indexing. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 5:45 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index You have to provide some more details to get meaningful help. You say I was trying to use boosting. How? At index time? Search time? Both? Can you provide some code snippets? What does your schema look like for the relevant field(s)? You say but seems not working right. What does that mean? No hits? Hits not ordered as you expect? Have you tried putting debugQuery=on on your URL and examined the return values? Have you looked at your index with the admin page and/or Luke to see if the data in the index is as you expect? As far as I know, boosts are multiplicative. So boosting by a value less than 1 will actually decrease the ranking. But see the Lucene scoring, See: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity. html And remember, that boosting will *tend* to move a hit up or down in the ranking, not position it absolutely. HTH Erick On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote: Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set of keywords with weights: Query: Sports 0.8, golf 0.5 I am trying to find the closest matching contents for this query. My question is how to index the contents with weighted scores, and how to write search query. I was trying to use boosting, but seems not working right. Thanks. Jianbin
Re: weighted search and index
Then I'm totally lost as to what you're trying to accomplish. Perhaps a higher-level statement of the problem would help. Because no matter how often I look at your point 2, I don't see what relevance the numbers have if you're not using them to boost at index time. Why are they even there? Erick On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote: Thank you very much Erick! 1. I used boost in search, but I don't know exactly what's the best way to boost, for such as Sports 0.8, golf 0.5 in my example, would it be sports^0.8 AND golf^0.5 ? 2. I cannot use boost in indexing. Because the weight of the value changes, not the field, look at this example again, C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 There is no good way to boost it during indexing. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 5:45 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index You have to provide some more details to get meaningful help. You say I was trying to use boosting. How? At index time? Search time? Both? Can you provide some code snippets? What does your schema look like for the relevant field(s)? You say but seems not working right. What does that mean? No hits? Hits not ordered as you expect? Have you tried putting debugQuery=on on your URL and examined the return values? Have you looked at your index with the admin page and/or Luke to see if the data in the index is as you expect? As far as I know, boosts are multiplicative. So boosting by a value less than 1 will actually decrease the ranking. But see the Lucene scoring, See: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity. htmlhttp://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.%0Ahtml And remember, that boosting will *tend* to move a hit up or down in the ranking, not position it absolutely. HTH Erick On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote: Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set of keywords with weights: Query: Sports 0.8, golf 0.5 I am trying to find the closest matching contents for this query. My question is how to index the contents with weighted scores, and how to write search query. I was trying to use boosting, but seems not working right. Thanks. Jianbin
RE: weighted search and index
Hi Erick, Each doc contains some keywords that are indexed. However each keyword is associated with a weight to represent its importance. In my example, D1: fruit 0.8, apple 0.4, banana 0.2 The keyword fruit is the most important keyword, which means I really really want it to be matched in a search result, but banana is less important (It would be good to be matched though). Hope that explains. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 6:23 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index Then I'm totally lost as to what you're trying to accomplish. Perhaps a higher-level statement of the problem would help. Because no matter how often I look at your point 2, I don't see what relevance the numbers have if you're not using them to boost at index time. Why are they even there? Erick On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote: Thank you very much Erick! 1. I used boost in search, but I don't know exactly what's the best way to boost, for such as Sports 0.8, golf 0.5 in my example, would it be sports^0.8 AND golf^0.5 ? 2. I cannot use boost in indexing. Because the weight of the value changes, not the field, look at this example again, C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 There is no good way to boost it during indexing. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 5:45 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index You have to provide some more details to get meaningful help. You say I was trying to use boosting. How? At index time? Search time? Both? Can you provide some code snippets? What does your schema look like for the relevant field(s)? You say but seems not working right. What does that mean? No hits? Hits not ordered as you expect? Have you tried putting debugQuery=on on your URL and examined the return values? Have you looked at your index with the admin page and/or Luke to see if the data in the index is as you expect? As far as I know, boosts are multiplicative. So boosting by a value less than 1 will actually decrease the ranking. But see the Lucene scoring, See: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity. htmlhttp://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila rity.%0Ahtml And remember, that boosting will *tend* to move a hit up or down in the ranking, not position it absolutely. HTH Erick On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote: Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set of keywords with weights: Query: Sports 0.8, golf 0.5 I am trying to find the closest matching contents for this query. My question is how to index the contents with weighted scores, and how to write search query. I was trying to use boosting, but seems not working right. Thanks. Jianbin
Confused with Shards multicore search results
Hi, I finally got shards work with multicore but now I am facing a different issue. I have 2 seperate schema / data config files for each core. I also have different unique id for each schema.xml file. I indexed both the cores and I was able to successfully search independently on each core but when I used Shards, I didnt get what I expected. For ex: http://localhost:8990/solr/core0/select?q=1565 returned 1 row http://localhost:8990/solr/core1/select?q=1565 returned 1 row When I tried this http://localhost:8990/solr/core0/select/?q=1565shards=localhost:8990/solr/core0,localhost:8990/solr/core1 It again returned just one row.. but I would think that it should return 2 rows if I have different unique id for each document. Is there any configuration I need to do in order to make it searchable across multiple indexex? any primary / slave configuration? any help would be of great help to me. Thanks a lot in advance. Thanks, Barani -- View this message in context: http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p27776478.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Implementing hierarchical facet
This dynamicfield feature is great. Didn't know about it. Thanks! --- On Wed, 3/3/10, Geert-Jan Brits gbr...@gmail.com wrote: From: Geert-Jan Brits gbr...@gmail.com Subject: Re: Implementing hierarchical facet To: solr-user@lucene.apache.org Date: Wednesday, March 3, 2010, 5:04 AM you could always define 1 dynamicfield and encode the hierarchy level in the fieldname: dynamicField name=_loc_hier_* type=string stored=false indexed=true omitNorms=true/ using: facet=onfacet.field={!key=Location}_loc_hier_cityfq=_loc_hier_country:somecountryid ... adding cityarea later for instance would be as simple as: facet=onfacet.field={!key=Location}_loc_hier_cityareafq=_loc_hier_city:somecityid Cheers, Geert-Jan 2010/3/3 Andy angelf...@yahoo.com Thanks. I didn't know about the {!key=Location} trick. Thanks everyone for your help. From what I could gather, there're 3 approaches: 1) SOLR-64 Pros: - can have arbitrary levels of hierarchy without modifying schema Cons: - each combination of all the levels in the hierarchy will result in a separate filter cache. This number could be huge, which would lead to poor performance 2) SOLR-792 Pros: - each level of the hierarchy separately results in filter cache. Much smaller number of filter cache. Better performance. Cons: - Only 2 levels are supported 3) Separate fields for each hierarchy levels Pros: - same as SOLR-792. Good performance Cons: - can only handle a fixed number of levels in the hierarchy. Adding any levels beyond that requires schema modification Does that sound right? Option 3 is probably the best match for my use case. Is there any trick to make it able to deal with arbitrary number of levels? Thanks. --- On Tue, 3/2/10, Geert-Jan Brits gbr...@gmail.com wrote: From: Geert-Jan Brits gbr...@gmail.com Subject: Re: Implementing hierarchical facet To: solr-user@lucene.apache.org Date: Tuesday, March 2, 2010, 8:02 PM Using Solr 1.4: even less changes to the frontend: facet=onfacet.field={!key=Location}countryid ... facet=onfacet.field={!key=Location}cityidfq=countryid:somecountryid etc. will consistently render the resulting facet under the name Location . 2010/3/3 Geert-Jan Brits gbr...@gmail.com If it's a requirement to let Solr handle the facet-hierarchy please disregard this post, but an alternative would be to have your App control when to ask for which 'facet-level' (e.g: country, state, city) in the hierarchy. as follows, each doc has 3 seperate fields (indexed=true, stored=false): - countryid - stateid - cityid facet on country: facet=onfacet.field=countryid facet on state ( country selected. functionally you probably don't want to show states without the user having selected a country anyway) facet=onfacet.field=countryidfq=countryid:somecountryid facet on city (state selected, same functional analogy as above) facet=onfacet.field=cityidfq=stateid:somestateid or facet on city (countryselected, same functional analogy as above) facet=onfacet.field=cityidfq=countryid:somecountryid grab the resulting facat and drop it under Location pros: - reusing fq's (good performance, I've never used hierarchical facets, but would be surprised if it has a (major) speed increase to this method) - flexible (you get multiple hierarchies: country -- state -- city and country -- city) cons: - a little more application logic Hope that helps, Geert-Jan 2010/3/2 Andy angelf...@yahoo.com I read that a simple way to implement hierarchical facet is to concatenate strings with a separator. Something like level1level2level3 with as the separator. A problem with this approach is that the number of facet values will greatly increase. For example I have a facet Location with the hierarchy countrystatecity. Using the above approach every single city will lead to a separate facet value. With tens of thousands of cities in the world the response from Solr will be huge. And then on the client side I'd have to loop through all the facet values and combine those with the same country into a single value. Ideally Solr would be aware of the hierarchy structure and send back responses accordingly. So at level 1 Solr will send back facet values based on country (100 or so values). Level 2 the facet values will be based on the states within the selected country (a few dozen values). Next level will be cities within that state. and so on. Is it possible to implement hierarchical facet this way using Solr?
Re: 2 Cores, 1 Table, 2 DataImporter -- Import at the same time ?
No, a core is a lucene index. Two DataImportHandler sessions to the same core will run on the same index. You should use lockType of simple or native. 'single' should only be used on a read-only index. From the stack trace it looks like you're only using one index in solr/core. You have to configure two separate cores with separate core directories. Check out the example/multicore directory for how that works. On Wed, Mar 3, 2010 at 6:39 AM, stocki st...@shopgate.com wrote: okay i change the lockType to single but with no good effect. so i think now, that my two DIH are using the same data-Folder. why ist it so ? i thought that each DIH use his own index ... ?! i think it is not possible to import from one table parallel with more than one DIH`s ?! myexception: java.io.FileNotFoundException: /var/lib/tomcat5.5/temp/solr/data/index/_5d.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:78) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:108) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.init(NIOFSDirectory.java:94) at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:70) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:691) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:68) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:662) at org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:954) at org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:5190) at org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:4354) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:4192) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4183) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2647) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2601) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Erik Hatcher-4 wrote: what's the error you're getting? is DIH keeping some static that prevents it from running across two cores separately? if so, that'd be a bug. Erik On Mar 3, 2010, at 4:12 AM, stocki wrote: pleeease help me somebody =( :P stocki wrote: Hello again ;) i install tomcat5.5 on my debian server ... i use 2 cores and two different DIH with seperatet Index, one for the normal search-feature and the other core for the suggest-feature. but i cannot start both DIH with an import command at the same time. how it this possible ? thx -- View this message in context: http://old.nabble.com/2-Cores%2C-1-Table%2C-2-DataImporter---%3E-Import-at-the-same-time---tp27756255p27765825.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/SEVERE%3A-SolrIndexWriter-was-not-closed-prior-to-finalize%28%29%2C-indicates-a-bugPOSSIBLE-RESOURCE-LEAK%21%21%21-tp27756255p27768997.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
facet performance when number of values is large
I have a facet field whose values are created by users. So potentially there could be a very large number of values. is that going to be a problem performance-wise? A few more questions to help me understand how facet works: - after the filter cache warmed up, will the (if any) performance problems caused by large number of facet values go away? I thought that would be the case but according to the benchmark here: http://wiki.apache.org/solr/HierarchicalFaceting SOLR-64 still had very poor performance even after the filter caches are warmed - In the wiki it was stated that facet.method=fc is excellent for situations where the number of indexed values for the field is high. Would that be the solution?
Re: Escaping options for tika/solr cell extract-only output
You can return it with any of the other writers, like JSON or PHP. The alternative design decision for the XML output writer would be to emit using CDATA instead of escaping. On Wed, Mar 3, 2010 at 12:54 PM, Dan Hertz (Insight 49, LLC) insigh...@gmail.com wrote: Looking at http://wiki.apache.org/solr/ExtractingRequestHandler: Extract Only the output includes XML generated by Tika (and is hence further escaped by Solr's XML) ...is there an option to NOT have the resulting TIKA output escaped? so lt;headgt; would come back as head/ If no, what would need to be done to enable this option? Looked into SOLR-1274.patch, but didn't see a parameter for such a thing. Thanks, Dan -- Lance Norskog goks...@gmail.com
Re: weighted search and index
Boosting by convention is flat at 1.0. Usually people boost with numbers like 3 or 5 or 20. On Wed, Mar 3, 2010 at 6:34 PM, Jianbin Dai j...@huawei.com wrote: Hi Erick, Each doc contains some keywords that are indexed. However each keyword is associated with a weight to represent its importance. In my example, D1: fruit 0.8, apple 0.4, banana 0.2 The keyword fruit is the most important keyword, which means I really really want it to be matched in a search result, but banana is less important (It would be good to be matched though). Hope that explains. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 6:23 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index Then I'm totally lost as to what you're trying to accomplish. Perhaps a higher-level statement of the problem would help. Because no matter how often I look at your point 2, I don't see what relevance the numbers have if you're not using them to boost at index time. Why are they even there? Erick On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote: Thank you very much Erick! 1. I used boost in search, but I don't know exactly what's the best way to boost, for such as Sports 0.8, golf 0.5 in my example, would it be sports^0.8 AND golf^0.5 ? 2. I cannot use boost in indexing. Because the weight of the value changes, not the field, look at this example again, C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 There is no good way to boost it during indexing. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 5:45 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index You have to provide some more details to get meaningful help. You say I was trying to use boosting. How? At index time? Search time? Both? Can you provide some code snippets? What does your schema look like for the relevant field(s)? You say but seems not working right. What does that mean? No hits? Hits not ordered as you expect? Have you tried putting debugQuery=on on your URL and examined the return values? Have you looked at your index with the admin page and/or Luke to see if the data in the index is as you expect? As far as I know, boosts are multiplicative. So boosting by a value less than 1 will actually decrease the ranking. But see the Lucene scoring, See: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity. htmlhttp://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila rity.%0Ahtml And remember, that boosting will *tend* to move a hit up or down in the ranking, not position it absolutely. HTH Erick On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote: Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set of keywords with weights: Query: Sports 0.8, golf 0.5 I am trying to find the closest matching contents for this query. My question is how to index the contents with weighted scores, and how to write search query. I was trying to use boosting, but seems not working right. Thanks. Jianbin -- Lance Norskog goks...@gmail.com
Re: Confused with Shards multicore search results
different unique id for each schema.xml file. All cores should have the same schema file with the same unique id field and type. Did you mean that the documents in both cores have a different value for the unique id field? On Wed, Mar 3, 2010 at 6:45 PM, JavaGuy84 bbar...@gmail.com wrote: Hi, I finally got shards work with multicore but now I am facing a different issue. I have 2 seperate schema / data config files for each core. I also have different unique id for each schema.xml file. I indexed both the cores and I was able to successfully search independently on each core but when I used Shards, I didnt get what I expected. For ex: http://localhost:8990/solr/core0/select?q=1565 returned 1 row http://localhost:8990/solr/core1/select?q=1565 returned 1 row When I tried this http://localhost:8990/solr/core0/select/?q=1565shards=localhost:8990/solr/core0,localhost:8990/solr/core1 It again returned just one row.. but I would think that it should return 2 rows if I have different unique id for each document. Is there any configuration I need to do in order to make it searchable across multiple indexex? any primary / slave configuration? any help would be of great help to me. Thanks a lot in advance. Thanks, Barani -- View this message in context: http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p27776478.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Confused with Shards multicore search results
Thanks a lot for your reply, I will surely try this.. I have a requirement to index 2 diff schema's but need to do a search on both using a single url. Is there a way I can have 2 diff schema's / data config file and do a search on both the indexes using a single URL (like using Shards?) Thanks, Barani Lance Norskog-2 wrote: different unique id for each schema.xml file. All cores should have the same schema file with the same unique id field and type. Did you mean that the documents in both cores have a different value for the unique id field? On Wed, Mar 3, 2010 at 6:45 PM, JavaGuy84 bbar...@gmail.com wrote: Hi, I finally got shards work with multicore but now I am facing a different issue. I have 2 seperate schema / data config files for each core. I also have different unique id for each schema.xml file. I indexed both the cores and I was able to successfully search independently on each core but when I used Shards, I didnt get what I expected. For ex: http://localhost:8990/solr/core0/select?q=1565 returned 1 row http://localhost:8990/solr/core1/select?q=1565 returned 1 row When I tried this http://localhost:8990/solr/core0/select/?q=1565shards=localhost:8990/solr/core0,localhost:8990/solr/core1 It again returned just one row.. but I would think that it should return 2 rows if I have different unique id for each document. Is there any configuration I need to do in order to make it searchable across multiple indexex? any primary / slave configuration? any help would be of great help to me. Thanks a lot in advance. Thanks, Barani -- View this message in context: http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p27776478.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com -- View this message in context: http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p2152.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Confused with Shards multicore search results
Hi, I think this will work as long as the fields involved in the search are identical. That's probably not the case with your shards, though. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: JavaGuy84 bbar...@gmail.com To: solr-user@lucene.apache.org Sent: Thu, March 4, 2010 12:49:31 AM Subject: Re: Confused with Shards multicore search results Thanks a lot for your reply, I will surely try this.. I have a requirement to index 2 diff schema's but need to do a search on both using a single url. Is there a way I can have 2 diff schema's / data config file and do a search on both the indexes using a single URL (like using Shards?) Thanks, Barani Lance Norskog-2 wrote: different unique id for each schema.xml file. All cores should have the same schema file with the same unique id field and type. Did you mean that the documents in both cores have a different value for the unique id field? On Wed, Mar 3, 2010 at 6:45 PM, JavaGuy84 wrote: Hi, I finally got shards work with multicore but now I am facing a different issue. I have 2 seperate schema / data config files for each core. I also have different unique id for each schema.xml file. I indexed both the cores and I was able to successfully search independently on each core but when I used Shards, I didnt get what I expected. For ex: http://localhost:8990/solr/core0/select?q=1565 returned 1 row http://localhost:8990/solr/core1/select?q=1565 returned 1 row When I tried this http://localhost:8990/solr/core0/select/?q=1565shards=localhost:8990/solr/core0,localhost:8990/solr/core1 It again returned just one row.. but I would think that it should return 2 rows if I have different unique id for each document. Is there any configuration I need to do in order to make it searchable across multiple indexex? any primary / slave configuration? any help would be of great help to me. Thanks a lot in advance. Thanks, Barani -- View this message in context: http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p27776478.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com -- View this message in context: http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p2152.html Sent from the Solr - User mailing list archive at Nabble.com.
Update Index : Updating Specific Fields
Hi, Is there any way to update the index for only the specific fields? Eg: Index has ONE document consists of 4 fields, F1, F2, F3, F4 Now I want to update the value of field F2, so if I send the update xml to SOLR, can it keep the old field values for F1,F3,F4 and update the new value specified for F2? Best Regards, Kranti K K Parisa
Re: Update Index : Updating Specific Fields
No. --wunder On Mar 3, 2010, at 10:40 PM, Kranti™ K K Parisa wrote: Hi, Is there any way to update the index for only the specific fields? Eg: Index has ONE document consists of 4 fields, F1, F2, F3, F4 Now I want to update the value of field F2, so if I send the update xml to SOLR, can it keep the old field values for F1,F3,F4 and update the new value specified for F2? Best Regards, Kranti K K Parisa
Too many .cfs files
HI All, I set up my 'mergerfactor ' as 10. i have loaded a 1million docs in to solr,after that iam able to see 14 .cfs files in my data/index folder. mergeFactor will not merge after the 11th record comes? Plese clearify? Thanks, Prasad -- View this message in context: http://old.nabble.com/Too-many-.cfs-files-tp2508p2508.html Sent from the Solr - User mailing list archive at Nabble.com.